Everyone seems to have a love/hate relationship with NFL Draft comparisons. If you search any prospects name followed by “draft comparison”, you’ll find them being compared to the highs of Patrick Mahomes to the lows of Nathan Peterman. Some go through extensive research and film analysis to come up with their comparisons….. while others just kinda compare them to another QB from the same school. Today we are going to let the computer do our comparisons, through a method called K-means clustering.
K-Means Clustering is a very popular unsupervised method of machine learning that essentially attempts to take a dataset and group similar things together (in this case, some of the notable QB prospects from 2016-2021). It is a very simple and effective way to analyze similarities between QB prospects. Some of the variables used include Expected Points Added (EPA) per play, passing and rushing explosiveness, EPA lost on sacks and INT’s, and completion percentage.
As with any sort of draft analysis, the draft is ultimately a crapshoot, and comparisons to current/former NFL players are rarely staples of accuracy. Without further ado, lets get to clustering!
In order to plot multiple variables on a X/Y scatter plot, I had to use principle component analysis (PCA) to reduce the number of mentions to the maximum of two dimensions. The number of clusters is something you have to manually choose, but using various techniques you can identify the optimal number of clusters. In this case, we have 5 clusters. Using the eye test you can see some interesting observations and of course some bizarre (Deshaun Watson……. im sorry!). Now that we have our clusters, we can go back into the dataset and see how these QB’s compare in their clusters.
Cluster 1 (Jack of all trades, master of none):
Strengths: None
Weaknesses: Rushing Explosiveness, Rushing EPA/Play
Deshaun Watson
Josh Rosen
Sam Darnold
Kyle Trask (2021 Draft Prospect)
Mitch Trubisky
Dwayne Haskins
Jamie Newman (2021 Draft Prospect)
Ok, you may be wondering, how in the heck is Deshaun Watson in this cluster? When you begin to look into his college numbers, you can start to understand why. When you compare this cluster to the other 4 clusters, this cluster ranked last in both average rushing explosiveness and rushing EPA/Play. In terms of average rushing explosiveness, Deshaun Watson actually ranked LAST in rushing explosiveness out of these 25 QB’s, and 18th in rushing EPA/Play. This cluster ranked 3rd or 4th in virtually every other variable used, which is how it has earned its title of the jack of all trades, but the master of none. Considering the wide range of NFL outcomes in this cluster, it will be interesting to see the career trajectories of Kyle Trask and Jamie Newman.
Cluster 2 (Wild and Explosive)
Strengths: Passing Explosiveness, Rushing Explosiveness, Total Rush EPA
Weaknesses: Completion Percentage, EPA lost to Interceptions, EPA/Pass
Josh Allen
Patrick Mahomes
Lamar Jackson
Speed. Explosion. Dazzling plays. This is the cluster that refuses to play boring football. This group was far and away the best rushing group, averaging 120.03 total expected points gained on the ground (the next closer cluster averaged 49.24 Total EPA Rushing). As hard as it is to believe now, these guys were not as efficient through the air, due in part to lower completion percentages and EPA lost due to interceptions. There are no 2021 NFL Draft Prospects in this cluster, which seems disappointing given the superstar performances this cluster produces every Sunday afternoon.
Cluster 3 (No Thank You!)
Strengths: None
Weaknesses: Total Pass EPA, EPA/Play, Passing Explosiveness, EPA lost to sacks and INTs,
Completion Percentage, EPA lost to fumbles
Daniel Jones
Kellen Mond (2021 Draft Prospect)
Drew Lock
Ive seen some in the media and on social media try to hype up Kellen Mond as a prospect that could be overlooked. He very well may be and the answer could lie in his film, but numbers wise, there is just nothing to like. This group averaged -134.59 total EPA lost due to interceptions, far and away the worst of the clusters. They also lost an average of 143.47 expected points due to sacks. Considering Mond is grouped with Daniel Jones and Drew Lock, this appears to be a cut and dry “not great in college = not great in the pros” situation.
Cluster 4 (Video Game Numbers)
Strengths: Everything
Weaknesses: Total Rush EPA
Tua Tagovailoa
Mac Jones (2021 Draft Prospect)
Kyler Murray
Take your pick of the stats and they were in the elite category. EPA/Play, explosive passing, completion percentage, you name it. In terms of maximizing expected points, there were no better college players for efficiency than these guys. The only knock on this group was total rushing EPA, but they were easily the most efficient bunch when they did decide to run, averaging 0.424 EPA/Rush, which was easily the highest average of the 5 clusters. Mac Jones enters the draft without nearly as much hype as the other 2 QB’s in this cluster. Mac Jones could fall to a more established team, and things could get interesting from there.
Cluster 5 (Diet Cluster 4)
Strengths: EPA/Play, Completion Percentage, Total Rush EPA, EPA Lost on INT’s
Weaknesses: Passing Explosiveness, EPA lost on sacks
Jalen Hurts
Dak Prescott
Justin Herbert
Baker Mayfield
Jared Goff
Justin Fields (2021 Draft Prospect)
Trevor Lawrence (2021 Draft Prospect)
Zach Wilson (2021 Draft Prospect)
Joe Burrow
Cluster Four ranked first in virtually every variable used for this modeling. The cluster that came in 2nd in virtually every variable? That would be cluster Five! This cluster on average completed 66% of their passes, with average EPA/Play of 0.24 over the course of their respective college careers. The only knocks on this cluster was a lack of explosiveness in the pass game, and an average loss of 107.59 expected points due to sacks. 3 of the top QB prospects from the 2021 Draft reside in this cluster, and for the most part everything seems to check out.
Conclusion
Clustering has its pros and cons, but ultimately can give us a quick and easy way to group prospects together to see if we can make any observations. These clusters seem to give us a pretty decent story on where the 2021 NFL Draft QB prospects stack up against some of the notable past prospects. Numbers won’t tell you the whole story, but they can be a useful tool in trying to solve the complex puzzle of QB evaluation.
If you want to dive in to the data like I do, check out @CFB_Data and @cfbscrapR on Twitter, where you can learn how to get started in the world of College Football data analysis!
If you want to see more charts and one off analysis, follow my twitter page, @CFBNumbers