Did Moni Nikolove Deserve to be NCAA Volleyball Player of the Year?
How does the NCAA Volleyball Player of the Year Get Chosen?
Introduction
For the last 3 years, I’ve been obsessed with volleyball. Growing up, I played tennis and basketball and dropped them in college as life got in the way. During my Master’s program, I found an intramural group that played volleyball and it was like combining the best parts of tennis and basketball. The non-contact aspect of tennis (who’s trying to get hurt at age 23?!) and the jumping aspect of basketball. I fell in love immediately and began to consume volleyball content whatever way I can. One of those ways involved watching a lot of NCAA Men’s Volleyball.
If you’re obsessed with volleyball like me, you know that Moni Nikolov took the NCAA by storm last year. As a 17 year old 6’9 starter for the Bulgarian national team, he quickly made his presence known in America. It just seemed like he was leagues above everyone else, eventually leading to a chip and a NCAA Player of the Year award to wrap it all up. (I actually had the chance to go watch Moni Nikolov last year at Lewis University set the collegiate record for serve speed!)
Now if you watched any of his games, you’d be like, yeah, its super obvious but does the data back it up? That’s what I’m going to try to answer today. A quick Google search didn’t give a satisfying answer, a lot of it was for women’s volleyball or men’s basketball. And for men’s basketball, it seems its determined by a panel of media members or a selection committee.
I’m sort of surprised the SEO for using data science to pick the player of the year (atleast for basketball) isn’t first page worthy. (Google fix your algo)
So let’s do it ourselves! I found this R package called ‘naccvolleyballr’ which is basically an api wrapper for volleyball stats from the NCAA website and fortunately for us, it has all player statistics for the 2024 season. Let’s take a quick look at what variables we have with the skimr package.
| Name | mvb_playerseason_div1_202… |
| Number of rows | 1077 |
| Number of columns | 23 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 18 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| team | 0 | 1 | 3 | 19 | 0 | 66 | 0 |
| conference | 0 | 1 | 3 | 20 | 0 | 9 | 0 |
| player | 0 | 1 | 7 | 24 | 0 | 1074 | 0 |
| yr | 0 | 1 | 2 | 3 | 0 | 5 | 0 |
| pos | 0 | 1 | 1 | 4 | 0 | 9 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| height_cm | 25 | 0.98 | 191.93 | 8.36 | 165.1 | 187.96 | 193.04 | 198.12 | 213.36 | ▁▃▇▇▁ |
| gp | 0 | 1.00 | 16.62 | 8.96 | 1.0 | 9.00 | 18.00 | 24.00 | 33.00 | ▆▃▅▇▃ |
| gs | 0 | 1.00 | 9.64 | 10.34 | 0.0 | 0.00 | 5.00 | 20.00 | 33.00 | ▇▂▂▃▁ |
| s | 0 | 1.00 | 50.33 | 33.90 | 1.0 | 18.00 | 50.00 | 81.00 | 121.00 | ▇▅▅▆▂ |
| kills | 0 | 1.00 | 67.36 | 92.42 | 0.0 | 2.00 | 24.00 | 100.00 | 525.00 | ▇▂▁▁▁ |
| errors | 0 | 1.00 | 27.55 | 37.69 | 0.0 | 1.00 | 10.00 | 36.00 | 173.00 | ▇▁▁▁▁ |
| total_attacks | 0 | 1.00 | 156.74 | 208.17 | 0.0 | 7.00 | 62.00 | 231.00 | 1051.00 | ▇▂▁▁▁ |
| hit_pct | 93 | 0.91 | 0.21 | 0.26 | -1.0 | 0.10 | 0.23 | 0.33 | 1.00 | ▁▁▆▇▁ |
| assists | 0 | 1.00 | 62.82 | 190.00 | 0.0 | 1.00 | 6.00 | 20.00 | 1229.00 | ▇▁▁▁▁ |
| aces | 0 | 1.00 | 8.09 | 10.70 | 0.0 | 0.00 | 4.00 | 12.00 | 106.00 | ▇▁▁▁▁ |
| s_err | 0 | 1.00 | 23.71 | 26.29 | 0.0 | 3.00 | 14.00 | 39.00 | 135.00 | ▇▂▁▁▁ |
| digs | 0 | 1.00 | 49.97 | 54.99 | 0.0 | 7.00 | 27.00 | 85.00 | 287.00 | ▇▂▂▁▁ |
| r_err | 0 | 1.00 | 6.02 | 10.01 | 0.0 | 0.00 | 1.00 | 7.00 | 53.00 | ▇▁▁▁▁ |
| block_solos | 0 | 1.00 | 3.19 | 4.88 | 0.0 | 0.00 | 1.00 | 5.00 | 38.00 | ▇▁▁▁▁ |
| block_assists | 0 | 1.00 | 17.31 | 21.51 | 0.0 | 0.00 | 8.00 | 27.00 | 104.00 | ▇▂▁▁▁ |
| b_err | 0 | 1.00 | 2.03 | 2.99 | 0.0 | 0.00 | 1.00 | 3.00 | 21.00 | ▇▁▁▁▁ |
| pts | 0 | 1.00 | 87.30 | 110.78 | 0.0 | 3.00 | 34.50 | 140.50 | 600.50 | ▇▂▁▁▁ |
| bhe | 0 | 1.00 | 0.32 | 0.86 | 0.0 | 0.00 | 0.00 | 0.00 | 11.00 | ▇▁▁▁▁ |
Each player has a variety of statistics such as hit_pct, kills, assists, etc. My first task in trying to answer the question, “did Moni deserve to win the Player of the Year” is coming up with a way to fairly judge a player’s contribution and impact on a team. Since player’s have different roles, they also will have different distributions when it comes to kills, digs, blocks, etc. A setter for instance will touch the ball way more than an attacker but in return, attack less and receive less. See the follow graph that shows the distribution of kills/set depending on which position you play.
Pin hitters (OH, OPPS, RS) tend to have a higher median when it comes to kills/set where setters, and liberos, and defensive specialists have lower kills/set. How do we fairly assess players?
Moni’s Season Statistics
During the 2024-2025 season, Moni had a hitting percentage of 0.398 ((Kills – Errors) / Total Attempts)) and averaged nearly an ace a set while bringing in on average 10 assists per set.
# A tibble: 16 × 3
stat value rate
<chr> <dbl> <dbl>
1 gp 33 0.3
2 s 110 1
3 kills 173 1.57
4 errors 48 0.436
5 total_attacks 314 2.85
6 hit_pct 0.398 0.00362
7 assists 1109 10.1
8 aces 106 0.964
9 s_err 128 1.16
10 digs 168 1.53
11 r_err 0 0
12 block_solos 9 0.0818
13 block_assists 72 0.655
14 b_err 5 0.0455
15 pts 324 2.95
16 bhe 0 0
Comparing Moni to Other Players
I first turned to Google to try to see if I could find some commonly used metrics or composite scores but nothing easily popped out. Then I went on Google Scholar to see if anyone’s made any papers. My cursory search led to nothing sadly. Then I went on ChatGPT and it was able to generate some suggestions. Let’s give those suggestions a try and see where we go from there.
ChatGPT first gave me some suggested metrics to assess players:
We can compare using the normalized z-score composite:
z = (player_stat - position_mean)/position_sd and generate composite score based off these z-scores:
PIS = 0.35 * Offensive_Value + 0.35 * Defensive_Value + 0.20 * Serve_Value + 0.10 * Playmaking_Value
where off = 0.40z_kill_rate + 0.40z_hitting_pct + 0.20z_assist_rate def = 0.55z_block_rate + 0.45z_dig_rate + 0.10z_rerr_rate srv = 0.60z_ace_rate + 0.40z_serr_rate pm = 0.70z_assist_rate + 0.30z_bhe_rate
Contribution per touch (CPT) value per touch = points contributed / touches
Win Probability Added (WPA) Player WPA = total win probability added over match
These last two seem like they require more in-depth data such as play by play data which ncaavolleyballr does have but for now lets save this for later and focus on the first one. To begin, let’s also start a ground rule where you must have played in minimum 30 sets to be considered. I do this so that players who might have played extraordinarily well for 5 sets then never stepped foot on the court again aren’t considered and mess up the distribution. I think 30 sets or ranging from 6-10 matches allows things to even out a little.
| rank | team | player | pos | PIS | z_kill_rate | z_hitting_pct | z_block_rate | z_dig_rate | z_assist_rate | z_ace_rate | z_serr_rate | z_rerr_rate | z_error_rate | z_bhe_rate | off | def | srv | pm |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Fort Valley St. | Oshane Farquharson | OH | 2.2876167 | 0.04223372 | 0.5460240 | -0.1248540 | 0.398857825 | 14.9584002 | -0.06562512 | 0.51533964 | 1.4734563 | 1.19911003 | -0.4208078 | 3.2269831 | 0.2581620 | 0.16676079 | 10.3446378 |
| 2 | Charleston (WV) | Joonatan Salpakari | L | 1.6858209 | 0.46489347 | 0.2329830 | 5.5009689 | 0.395221217 | 1.5513440 | 4.47626352 | -3.58716221 | -0.2331110 | -0.15929137 | 0.2575454 | 0.5894194 | 3.1800714 | 1.25089322 | 1.1632044 |
| 3 | Lincoln Memorial | Matthew Gentry | MB | 1.6836298 | 2.21719835 | 1.9373855 | 1.8092538 | 1.500345732 | 1.1291423 | 3.83282952 | -1.57707428 | 0.3494200 | 0.96361025 | 0.4439664 | 1.8876620 | 1.7051872 | 1.66886800 | 0.9235896 |
| 4 | Long Beach St. | Moni Nikolov | S | 1.4190775 | 1.04406146 | 0.8095261 | 1.8778209 | 0.718102238 | 0.9987032 | 5.71071019 | -2.14367255 | 0.3450180 | -0.09982329 | 0.6436025 | 0.9411757 | 1.3904493 | 2.56895709 | 0.8921730 |
| 5 | Pepperdine | George Dyer | L | 1.2923005 | 0.56548640 | 0.3435073 | 4.9716384 | -0.242441577 | 1.1782507 | 4.54124810 | -6.07339789 | 0.4121764 | -0.08789357 | 0.2575454 | 0.5992476 | 2.6665200 | 0.29538971 | 0.9020391 |
| 6 | Daemen | Kyle Zelasko | DB | 1.1338106 | 1.67357090 | 1.1552035 | 2.1055685 | 1.759245424 | 0.6107251 | 0.06078953 | -0.07767272 | -1.2154939 | 0.42649828 | 0.3779645 | 1.2536548 | 1.8281737 | 0.00540463 | 0.5408969 |
| 7 | UC Irvine | Hilir Henno | OH | 1.0790196 | 0.90544813 | 1.0494117 | 0.6095088 | 0.706870816 | 0.6294195 | 4.66231184 | -0.91715945 | -0.2350351 | 0.77864949 | 0.3559813 | 0.9078278 | 0.6298182 | 2.43052332 | 0.5473880 |
| 8 | Loyola Chicago | Nicodemus Meyer | MB | 0.9732250 | 1.60362392 | 1.6110706 | 1.8633624 | 0.009975925 | 1.5639833 | -0.65347260 | 0.02715742 | 0.1961918 | 1.20901295 | 0.4439664 | 1.5986744 | 1.0489577 | -0.38122059 | 1.2279782 |
| 9 | Mount Olive | Jackson Lahey | S | 0.9666382 | 0.49616855 | 0.6160337 | 2.4327592 | 0.264407285 | 1.2108698 | 1.56662533 | -1.09901114 | 0.3450180 | 0.54309281 | 0.6436025 | 0.6870548 | 1.4915026 | 0.50037074 | 1.0406896 |
| 10 | Loyola Chicago | Ryan McElligott | S | 0.9642723 | 1.07447506 | 0.9008975 | 1.7462550 | 1.142528109 | 1.2004207 | 1.14936006 | -1.99028778 | 0.1585909 | 0.06788611 | 0.6436025 | 1.0302332 | 1.4904370 | -0.10649908 | 1.0333752 |
Using this composite score, Moni Nikolov is 4! His kill rate is better than about 84% of players. His ace rate is absolutely insane with a z-score of 5.7. This composite argues in favor of Moni Nikolov being the player of the year. Who are the other three that are rated higher than Moni?
Oshane Farquharson is a 6-4 outside hitter for the Fort Valley St. Wildcats where they have won back to back conference (SIAC) champions and the first HBCU to earn consecutive appearances at Nationals. Joonatan Salpakari is a 6-3 outside hitter for the Charleston Golden Eagles where they went 12-17 and lost in the Quarterfinals of the EIVA tournament. Matthew Gentry is a 6-6 middle blocker for the Lincoln Memorial Railsplitters and they went 24-1 last season and won their IVA tournament.
Sidenote: Oshane has an insane assist rate z-score of 13.54 for an OH. He averages 7.04 assists per set. Is this really true for an OH? Turns out…no! After some ad hoc diving and some footage review, I think he was misreported as an outside but he’s actually a setter.
This score seems to argue in favor of Nikolov’s award but what about winning? At the end of the day, its all about winning. How did the top 10 players end up win wise?
| rank | team | player | PIS | Overall.record |
|---|---|---|---|---|
| 1 | Fort Valley St. | Oshane Farquharson | 2.2876167 | 16-10 (0.615) |
| 2 | Charleston (WV) | Joonatan Salpakari | 1.6858209 | 12-17 (0.414) |
| 3 | Lincoln Memorial | Matthew Gentry | 1.6836298 | NA |
| 4 | Long Beach St. | Moni Nikolov | 1.4190775 | 30-3 (0.909) |
| 5 | Pepperdine | George Dyer | 1.2923005 | 21-10 (0.677) |
| 6 | Daemen | Kyle Zelasko | 1.1338106 | NA |
| 7 | UC Irvine | Hilir Henno | 1.0790196 | 21-7 (0.750) |
| 8 | Loyola Chicago | Nicodemus Meyer | 0.9732250 | 25-4 (0.862) |
| 9 | Mount Olive | Jackson Lahey | 0.9666382 | 20-4 (0.833) |
| 10 | Loyola Chicago | Ryan McElligott | 0.9642723 | 25-4 (0.862) |
Had to manually look it up because it didn’t get scraped but LMU had a record of 24-1 and Daemen had a record of 15-13 however, if you look at their opponents, their opponents are hardly against top ranked teams. For instance, LMU’s only lost came at the hands of a nationally ranked team where they proceeded to lose 3-0. Outside of LMU, LBSU is the only team that had above a 0.900 win rate.
On top of Moni being a top 5 player according to this metric, his team also had one of the highest win rate and they won the national championship. This kid’s pretty good I suppose!
Comparing Moni to Previous Player of the Years
Moni’s pretty good but how does he stack up against other Player of the Years. Where does he rank? What do player of the years have in common?
There have been 35 Players of the Year from 1991-2025. Sadly the R package does not go beyond 2020 and https://stats.ncaa.org/rankings/change_sport_year_div goes back to 2008. After some time spent trying to scrape the data from stats.ncaa.org, I just ended up grabbing the data manually. The juice was not worth the squeeze especially for 18 rows.
| Name | ncaa_poy_2009_2025 |
| Number of rows | 17 |
| Number of columns | 21 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 16 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 10 | 21 | 0 | 16 | 0 |
| pos | 0 | 1 | 1 | 3 | 0 | 3 | 0 |
| year | 0 | 1 | 7 | 7 | 0 | 17 | 0 |
| team | 0 | 1 | 3 | 19 | 0 | 9 | 0 |
| class | 0 | 1 | 3 | 3 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| s | 0 | 1.00 | 99.29 | 16.23 | 61.00 | 96.00 | 102.00 | 105.00 | 120.0 | ▂▁▃▇▃ |
| ms | 7 | 0.59 | 24.00 | 12.21 | 1.00 | 28.00 | 29.00 | 30.00 | 33.0 | ▂▁▁▁▇ |
| kills | 0 | 1.00 | 356.12 | 181.10 | 46.00 | 244.00 | 388.00 | 499.00 | 661.0 | ▅▅▆▇▃ |
| errors | 0 | 1.00 | 91.94 | 43.75 | 11.00 | 77.00 | 101.00 | 121.00 | 157.0 | ▃▁▂▇▃ |
| total_attacks | 0 | 1.00 | 711.06 | 372.72 | 97.00 | 487.00 | 755.00 | 982.00 | 1287.0 | ▇▃▇▆▇ |
| hit_pct | 0 | 1.00 | 0.37 | 0.05 | 0.28 | 0.35 | 0.38 | 0.39 | 0.5 | ▂▃▇▁▁ |
| assists | 0 | 1.00 | 285.18 | 492.82 | 2.00 | 20.00 | 31.00 | 62.00 | 1455.0 | ▇▁▁▂▁ |
| aces | 0 | 1.00 | 44.71 | 21.98 | 19.00 | 29.00 | 38.00 | 57.00 | 106.0 | ▇▂▆▁▁ |
| s_err | 3 | 0.82 | 72.79 | 31.76 | 22.00 | 53.00 | 74.50 | 95.00 | 128.0 | ▆▃▇▆▃ |
| digs | 0 | 1.00 | 171.24 | 43.23 | 93.00 | 146.00 | 168.00 | 204.00 | 258.0 | ▃▆▇▇▂ |
| r_err | 6 | 0.65 | 18.64 | 16.30 | 1.00 | 7.50 | 15.00 | 22.00 | 55.0 | ▃▇▁▁▁ |
| block_solos | 0 | 1.00 | 7.94 | 3.34 | 1.00 | 6.00 | 8.00 | 10.00 | 13.0 | ▂▂▇▃▅ |
| block_assists | 0 | 1.00 | 57.71 | 14.52 | 38.00 | 51.00 | 54.00 | 64.00 | 101.0 | ▃▇▃▁▁ |
| b_err | 4 | 0.76 | 5.69 | 3.95 | 1.00 | 2.00 | 5.00 | 7.00 | 14.0 | ▇▇▆▁▃ |
| pts | 3 | 0.82 | 433.93 | 172.09 | 96.50 | 335.38 | 474.75 | 559.25 | 646.0 | ▃▁▅▆▇ |
| bhe | 7 | 0.59 | 2.20 | 1.14 | 0.00 | 2.00 | 2.00 | 3.00 | 4.0 | ▂▂▇▆▂ |
What do the best players have in common? Their hitting percentages are quite high, with only one player below 0.300 (0.275) with most hovering between 0.350 and 0.400.
| rank | team | name | pos | year | PIS | z_kill_rate | z_hitting_pct | z_block_rate | z_dig_rate | z_assist_rate | z_ace_rate | z_serr_rate | z_rerr_rate | z_error_rate | z_bhe_rate | off | def | srv | pm |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Stanford | Kawika Shoji | S | 2009-10 | 0.9526433786 | 0.47614078 | 0.8612867 | -0.16575457 | 2.45262926 | 2.2593258 | -0.7237880 | 1.5843333 | 0.79489130 | 0.83950785 | 0.9001090 | 0.98683615 | 1.09200729 | 0.19946050 | 1.8515608 |
| 2 | Pepperdine | Paul Carroll | OPP | 2008-09 | 0.6548768259 | 0.11127327 | 0.3710930 | 1.95022403 | 1.22242132 | -0.5487738 | -0.7112227 | 1.5843333 | 0.79489130 | 0.50565211 | 0.9001090 | 0.08319173 | 1.70220194 | 0.20699969 | -0.1141090 |
| 3 | Long Beach St. | Moni Nikolov | S | 2024-25 | 0.5935562139 | 1.03154555 | 0.4989696 | 0.33198844 | -0.63404115 | 1.4851375 | 2.2346965 | -1.4458687 | 0.79489130 | -0.72849153 | 0.9001090 | 0.90923356 | -0.02323574 | 0.76247042 | 1.3096289 |
| 4 | Long Beach St. | Josh Tuaniga | S | 2017-18 | 0.4682620409 | 2.44287346 | 2.5663084 | -1.63919565 | -0.03720823 | 1.5574282 | -0.7340073 | -0.3884545 | 0.79489130 | 0.84887278 | -1.2773958 | 2.31515840 | -0.83881218 | -0.59578618 | 0.7069810 |
| 5 | BYU | Gabi Garcia Fernandez | OPP | 2019-20 | 0.2671344466 | -0.19839555 | -0.6732328 | 1.72325838 | -0.64294582 | -0.5554003 | 2.0328446 | -0.6782279 | 0.56482283 | -0.93811731 | 0.9001090 | -0.45973141 | 0.71494878 | 0.94841560 | -0.1187475 |
| 6 | Long Beach St. | TJ DeFalco | OH | 2018-19 | 0.0451325015 | 1.05816955 | 0.4776568 | 0.07380840 | -0.40539275 | -0.4600436 | 0.3953348 | -1.0447858 | -0.14971674 | -0.82794399 | -0.4817690 | 0.52232182 | -0.15680379 | -0.18071343 | -0.4665612 |
| 7 | Hawaii | Rado Parapunov | OPP | 2020-21 | 0.0192907452 | -0.07054534 | -0.9716116 | 1.61551205 | -0.08621352 | -0.5759844 | -0.4643753 | -0.5389933 | 0.57898089 | -1.71788308 | 0.9001090 | -0.53205968 | 0.90763363 | -0.49422251 | -0.1331564 |
| 8 | Loyola Chicago | Thomas Jaeschke | OH | 2014-15 | 0.0007796494 | 0.13569402 | 0.1366525 | -0.63196054 | 1.03889326 | -0.5096668 | 0.5052812 | -0.5586074 | -0.81319143 | 0.02749145 | 0.1515917 | 0.00700524 | 0.03860453 | 0.07972577 | -0.3112892 |
| 9 | Southern California | Murphy Troy | OPP | 2010-11 | -0.0463428840 | -0.27262867 | -0.3535413 | 0.28098023 | -0.54800475 | -0.5648417 | -0.2412527 | 1.5843333 | 0.79489130 | -0.23011595 | 0.9001090 | -0.36343632 | -0.01257388 | 0.48898170 | -0.1253565 |
| 10 | Long Beach St. | TJ DeFalco | OH | 2016-17 | -0.0894557102 | 0.28033217 | 0.4137185 | 0.04994302 | 0.02393460 | -0.4958450 | -0.7287474 | -0.3049011 | -0.09944348 | 0.32141887 | -0.5088647 | 0.17845127 | 0.02829489 | -0.55920888 | -0.4997509 |
Moni’s ranked third on this list, right behind Kawika Shoji (Erik Shoji’s older brother) and Paul Carroll. Do you agree with this assessment?
Player Profiles
To wrap this up, let’s take a closer look by performing a Principle Component Analysis using “hit_pct”, “kill_rate”, “error_rate”, “ace_rate”, “serr_rate”, “dig_rate”, “rerr_rate”, “block_rate”, “block_error_rate”, “assist_rate”, and “bhe_rate” to generate player profiles. As alwasy make sure to scale and center variables since the center and scales are different across the variables.
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 1.7633 1.6509 1.4065 1.1862 0.80125 0.70061 0.50223
Proportion of Variance 0.2826 0.2478 0.1798 0.1279 0.05836 0.04462 0.02293
Cumulative Proportion 0.2826 0.5304 0.7103 0.8382 0.89653 0.94115 0.96408
PC8 PC9 PC10 PC11
Standard deviation 0.42532 0.38700 0.25391 0.002035
Proportion of Variance 0.01644 0.01362 0.00586 0.000000
Cumulative Proportion 0.98052 0.99414 1.00000 1.000000
PC1 PC2 PC3
hit_pct -0.23988215 0.44084055 -0.26601060
kill_rate -0.35039457 0.22872112 -0.41914811
error_rate -0.11745159 -0.45712062 -0.18337249
ace_rate -0.23476064 -0.37701756 -0.29889921
serr_rate -0.46631724 -0.23712738 -0.03771777
dig_rate 0.33019383 0.38276045 -0.17681749
rerr_rate -0.19077279 0.02111884 0.50554668
block_rate 0.23890069 -0.32318354 -0.30502767
block_error_rate -0.45163448 0.10242376 -0.04026420
assist_rate -0.06833274 0.25493365 -0.32246529
bhe_rate -0.34954566 0.13832714 0.37904559
3 Principal Components explain roughly 71% of the variance and if we take a look at those three loadings, we can pick out different characteristics. For PC1, the only positives are dig rate and block rate while the rest are negative so high PC1 seems to align with defense ability while low values correspond to perhaps more offensive players. PC2 has high values in hit_pct, dig_rate, assist_rate so it seems like this loading is describing a fundamentally sound or high variance player. PC3 seems to be be about ball control seeing the high values for receiving error and ball handling error.
From plotting PC1 and PC2, we can see Opposites then to have high PC 1 scores but low PC 2 scores suggesting they are quite controlled and high variance when playing offense. Setters are all over the place. Josh Tuaniga has very low PC1 but very high PC 2 pointing to his efficiency but risky plays. Outsides are more concentrated in PC2 but are spread out along PC 1. Taylor Sander and Taylor Crabb who now play beach volleyball have the two highest PC1 loadings for outsides which reinforces their ability to control the ball, whereas someone like Alex Nikolov (Moni’s brother) and TJ DeFalco are more risky and offensive/aggressive players.
Here, I’ve plotted them on a radar plot and we can see how these positions differ and how Moni’s skill distribution compares to theirs. Moni’s profile is actually quite different than the average Setter profile, his dig rate is lower but his service ace and error rate are all very high. This is probably attributed to his 6-9 frame which allows him to be more aggressive at the baseline and in offensive opportunities.
In conclusion, Moni ranked highly on this AI generated metric we calculated, led his team to a chip, and when compared to previous player of the years, he was ranked third. His long frame has enabled him to become one of the more aggressive Setters we’ve seen, earning him a huge Ace count (but also a huge serving error rate) in addition to block rate and kills. (There’s this clip of Andrew Rowan pushing the ball over and Moni hammering it down like it was a perfect set to him). The data backs up his award but at the end of the day, if you’ve watched his game, it doesn’t take much to see that this guy’s good, really good.
Appendix: All code for this report
library(ncaavolleyballr)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(gt)
# grabs player information for 2024 season
# system.time({
# ncaavolleyballr::division_stats(
# year = 2024,
# sport = "MVB",
# level = "teamseason",
# save = TRUE,
# path = "."
# )
# })
# ncaavolleyballr::division_stats(
# year = 2024,
# sport = "MVB",
# level = "teamseason",
# save = TRUE,
# path = "."
# )
mvb_playerseason_div1_2024 <- read_csv("~/Documents/upskilling/Volleyball Analytics/data/mvb_playerseason_div1_2024.csv")
# ncaa_mvb_2024_player_statistics <- readRDS("~/Documents/upskilling/Volleyball Analytics/data/ncaa_mvb_2024_player_statistics.RDS")
mvb_playerseason_div1_2024 <- mvb_playerseason_div1_2024 |>
janitor::clean_names() |>
select(team, conference, player, yr, pos, ht, gp:bhe, -c(tripl_dbl, ms)) |>
mutate(
ht_cm = str_match(ht, "^(\\d+)-(\\d+)$"), # extract feet + inches
feet = as.numeric(ht_cm[,2]),
inches = as.numeric(ht_cm[,3]),
height_cm = (feet * 12 + inches) * 2.54 # convert total inches → cm
, .before = ht
) %>%
filter(pos != "") |>
replace_na(list(total_attacks = 0,
kills = 0,
errors = 0,
r_err = 0,
block_solos = 0,
block_assists = 0,
b_err = 0,
pts = 0,
bhe = 0,
aces = 0,
assists = 0,
s_err = 0,
digs = 0 )) |>
select(-ht, -ht_cm, -feet, -inches)
skimr::skim(mvb_playerseason_div1_2024)
# example of uneven distribution
ggplot(mvb_playerseason_div1_2024, aes(x = kills/s, fill = pos)) +
geom_boxplot()
mvb_playerseason_div1_2024 |>
filter(player == "Moni Nikolov") |>
select(gp, s, kills: bhe) |>
pivot_longer(
cols = everything(),
names_to = "stat",
values_to = "value"
) |>
mutate(rate = value / value[stat == "s"])
curr_data <- mvb_playerseason_div1_2024 |>
filter(s >= 30) |>
mutate(
kill_rate = kills / total_attacks,
error_rate = errors / total_attacks,
ace_rate = aces / s,
serr_rate = s_err / s,
dig_rate = digs / s,
rerr_rate = r_err / s,
block_rate = (block_solos + block_assists/2) / s,
block_error_rate = b_err / s,
assist_rate = assists / s,
bhe_rate = bhe / s
) |>
group_by(pos) %>%
mutate(
z_kill_rate = scale(kill_rate)[,1],
z_hitting_pct = scale(hit_pct)[,1],
z_block_rate = scale(block_rate)[,1],
z_dig_rate = scale(dig_rate)[,1],
z_assist_rate = scale(assist_rate)[,1],
z_ace_rate = scale(ace_rate)[,1],
z_serr_rate = -scale(serr_rate)[,1], # negative stat → invert
z_rerr_rate = -scale(rerr_rate)[,1],
z_error_rate = -scale(error_rate)[,1],
z_bhe_rate = -scale(bhe_rate)[,1]
) |>
mutate(
off = 0.40*z_kill_rate +
0.40*z_hitting_pct +
0.20*z_assist_rate,
def = 0.55*z_block_rate +
0.45*z_dig_rate +
0.10*z_rerr_rate,
srv = 0.60*z_ace_rate +
0.40*z_serr_rate,
pm = 0.70*z_assist_rate +
0.30*z_bhe_rate,
PIS = 0.35*off + 0.35*def + 0.20*srv + 0.10*pm
) |> ungroup()
curr_data |>
select(team, player, pos, PIS, z_kill_rate:pm) |>
arrange(desc(PIS)) |>
mutate(rank = row_number(), .before = 1) |>
head(10) |>
gt::gt()
temp_data <- curr_data |>
select(team, player, PIS, z_kill_rate:pm) |>
arrange(desc(PIS)) |>
mutate(rank = row_number(), .before = 1) |>
head(10)
team_standings <- map(
mvb_teams |> filter(team_name %in% temp_data$team, yr == 2024) |> pull(team_id),
~ {
Sys.sleep(15)
temp <- team_season_info(team_id = .x)
c(temp$team_info$team_name, temp$record)
},
.progress = TRUE
)
team_results <- do.call(rbind,team_standings) |> data.frame() |>
rename(team_name = V1)
saveRDS(team_results, "ncaa_mvb_2024_team_results.RDS")
ncaa_mvb_2024_team_results <- readRDS("~/Documents/upskilling/Volleyball Analytics/data/ncaa_mvb_2024_team_results.RDS")
curr_data |>
select(team, player, PIS, z_kill_rate:pm) |>
arrange(desc(PIS)) |>
mutate(rank = row_number(), .before = 1) |>
head(10) |>
left_join(ncaa_mvb_2024_team_results, by = join_by(team == team_name)) |>
select(rank, team, player, PIS, Overall.record
) |>
gt()
ncaa_poy_2009_2025 <- readxl::read_excel("~/Documents/upskilling/Volleyball Analytics/data/ncaa_poy_2009_2025.xlsx") |>
janitor::clean_names() |>
rename(hit_pct = pct)
skimr::skim(ncaa_poy_2009_2025)
curr_data <- ncaa_poy_2009_2025 |>
filter(s >= 30) |>
replace_na(list(total_attacks = 0,
kills = 0,
errors = 0,
r_err = 0,
block_solos = 0,
block_assists = 0,
b_err = 0,
pts = 0,
bhe = 0,
aces = 0,
assists = 0,
s_err = 0,
digs = 0 )) |>
mutate(
kill_rate = kills / total_attacks,
error_rate = errors / total_attacks,
ace_rate = aces / s,
serr_rate = s_err / s,
dig_rate = digs / s,
rerr_rate = r_err / s,
block_rate = (block_solos + block_assists/2) / s,
block_error_rate = b_err / s,
assist_rate = assists / s,
bhe_rate = bhe / s
)
curr_data |>
mutate(
z_kill_rate = scale(kill_rate)[,1],
z_hitting_pct = scale(hit_pct)[,1],
z_block_rate = scale(block_rate)[,1],
z_dig_rate = scale(dig_rate)[,1],
z_assist_rate = scale(assist_rate)[,1],
z_ace_rate = scale(ace_rate)[,1],
z_serr_rate = -scale(serr_rate)[,1], # negative stat → invert
z_rerr_rate = -scale(rerr_rate)[,1],
z_error_rate = -scale(error_rate)[,1],
z_bhe_rate = -scale(bhe_rate)[,1]
) |>
mutate(
off = 0.40*z_kill_rate +
0.40*z_hitting_pct +
0.20*z_assist_rate,
def = 0.55*z_block_rate +
0.45*z_dig_rate +
0.10*z_rerr_rate,
srv = 0.60*z_ace_rate +
0.40*z_serr_rate,
pm = 0.70*z_assist_rate +
0.30*z_bhe_rate,
PIS = 0.35*off + 0.35*def + 0.20*srv + 0.10*pm
) |>
ungroup() |>
select(team, name, pos, year, PIS, z_kill_rate:pm) |>
arrange(desc(PIS)) |>
mutate(rank = row_number(), .before = 1) |>
head(10) |>
gt::gt()
# ---- Select the variables for PCA ----
rate_vars <- c(
"hit_pct", "kill_rate", "error_rate", "ace_rate", "serr_rate",
"dig_rate", "rerr_rate", "block_rate", "block_error_rate",
"assist_rate", "bhe_rate"
)
pca_df <- curr_data |>
select(all_of(rate_vars)) |>
drop_na() # should be fine already, but just in case
# ---- Run PCA on scaled data ----
pca_res <- prcomp(pca_df, center = TRUE, scale. = TRUE)
# PCA scores (PC1, PC2, PC3, ...)
scores <- as.data.frame(pca_res$x)
summary(pca_res)
# # ---- K-means on the first 3 PCs ----
# set.seed(123)
# k <- 3 # you can try 2,3,4 and compare
# km <- kmeans(scores[, 1:3], centers = k, nstart = 50)
#
# ---- Attach cluster + PCs back to curr_data ----
clustered <- curr_data |>
mutate(
PC1 = scores$PC1,
PC2 = scores$PC2,
PC3 = scores$PC3
# cluster = factor(km$cluster)
)
ggplot(clustered, aes(PC1, PC2, #color = cluster,
label = name, color = pos)) +
geom_point(size = 3) +
ggrepel::geom_text_repel(show.legend = FALSE) +
theme_minimal() +
labs(
title = "PCA + k-means Clustering of POY Players",
subtitle = "Clustering on first 3 PCs",
color = "Cluster",
shape = "Position"
)
# cluster_profiles <- clustered |>
# group_by(cluster) |>
# summarise(
# n_players = n(),
# across(all_of(rate_vars), mean, .names = "mean_{.col}")
# )
#
# cluster_profiles
pca_res$rotation[,1:3]
library(fmsb)
# ---- z-score the rate vars ----
scaled_rates <- curr_data |>
select(all_of(rate_vars)) |>
mutate(across(everything(), scale)) # each column -> mean 0, sd 1
# Attach back to clusters
clustered_scaled <- curr_data |>
select(name, pos) |>
bind_cols(
scaled_rates
)
cluster_means <- clustered_scaled |>
group_by(pos) |>
summarise(across(all_of(rate_vars), mean), .groups = "drop")
# cluster_means
# Set reasonable radar bounds for z-scores
radar_max <- 2 # +2 SD
radar_min <- -1 # -2 SD
# Make a named vector of colors for clusters (optional)
cluster_cols <- c("OH" = "red", "OPP" = "blue", "S" = "darkgreen")
par(mfrow = c(1, 3)) # multiple panels side by side; or comment this out and do one by one
for (cl in unique(clustered$pos)) {
cl_row <- cluster_means |>
filter(pos == cl) |>
select(all_of(rate_vars))
# Build radar data frame
radar_df <- rbind(
rep(radar_max, length(rate_vars)),
rep(radar_min, length(rate_vars)),
cl_row
)
colnames(radar_df) <- rate_vars
rownames(radar_df) <- c("max", "min", paste("Cluster", cl))
radarchart(
radar_df,
axistype = 1,
pcol = cluster_cols[cl],
pfcol = scales::alpha(cluster_cols[cl], 0.4),
plwd = 2,
cglty = 1,
cglcol = "grey80",
caxislabels = c("-2", "-1", "0", "1", "2"),
# caxislabels = c("-1", "0", "1"),
title = paste("Cluster", cl, "Profile (z-scores)")
)
}
player_name <- "Moni Nikolov"
player_row <- clustered_scaled |>
filter(name == player_name) |>
select(all_of(rate_vars))
radar_df_player <- rbind(
rep(radar_max, length(rate_vars)),
rep(radar_min, length(rate_vars)),
player_row
)
colnames(radar_df_player) <- rate_vars
rownames(radar_df_player) <- c("max", "min", player_name)
radarchart(
radar_df_player,
axistype = 1,
pcol = "purple",
pfcol = scales::alpha("purple", 0.4),
plwd = 2,
cglty = 1,
cglcol = "grey80",
caxislabels = c("-2", "-1", "0", "1", "2"),
title = paste(player_name, "Profile (z-scores vs POY group)")
)