The Malik Willis Enigma: Rushing Production Clustering of the NFL Draft QB Prospects
"Malik Willis is the next Lamar Jackson!"... is he?
The 2022 NFL Draft QB Class has a consensus feeling of being underwhelming and not as strong as previous classes. During the season it wasn’t clear who would end up being the first QB on the big boards. As of now it is starting to look like one QB is separating themselves from the pack: Liberty QB Malik Willis.
Malik Willis is the QB most analysts believe has the highest ceiling in this draft class. This is due in part to two things: a rocket for an arm, and his ability to make plays with his legs. His ability to extend plays with his legs and rip off long runs has earned him some lofty comparisons, namely former Louisville and current superstar Baltimore Ravens QB Lamar Jackson.
Looking at Total Expected Points Added (EPA), we can see that Malik Willis has generated the most EPA from rushes since Lamar Jackson out of draft prospects the College Football Playoff era. However, you can see that the 2nd place is a very distant second behind Jackson. To be fair to Willis, Jackson accumulated a lot more rushes in his college career.
If we look at the Expected Points Added per Rush, we can actually see Willis and Kyler Murray leapfrog over Jackson. Of course, this is just one metric. While EPA does provide a great way to measure a players per play efficiency, we need more information to make a statement as bold as comparing Malik Willis to Lamar Jackson.
We could compile a bunch of rushing metrics and use our eyeballs to make comparisons, or we do something simpler: use an algorithm! Clustering is no stranger to the newsletter, as we have used it on draft prospects last year, and teams this year. However, this time we will be using a slightly different algorithm. One drawback with the K-Means clustering algorithm is it can be very sensitive to outliers, which in our case would be Lamar Jacksons rushing volume. Instead, we will be using the K-Medoids clustering algorithm.
The main difference between the two is with the K-Means clustering algorithm, the center of a cluster is the calculated mean value of all of the data points in that particular cluster. With the K-Medoids algorithm, the center is actually one of the data points in the cluster. As stated above, this method is less sensitive to noise and outliers in the data.
The Ingredients For Clustering
We need to gather as many metrics as we can in order to get the best possible picture we are trying to create. To do this we used various metrics that can be broken down into three groups:
Volume Stats: Total carries, total yards gained, total EPA gained from rushing, total EPA lost from fumbles, total points added from rushing (Used in ESPN’s QBR). This captures how much a QB ran and how much their running contributed to the offensive production. These three production metrics represent the rawest form of production (yards), taking that raw production and introducing context (EPA), and finally adjusting that production based on opponent/situation (Points Added from QBR).
Rate Stats: Yards Per Play, EPA per rush, PA per rush from QBR, success rate (plays with positive EPA), and explosive play rate (plays with a 75th+ percentile EPA). These answer the question of “when they did decide to run, how efficient were they?”
Final Year Production: Isolating the metrics above to the players final year in college. Did they end college on a high note or did they regress a little?
Now that we have our metrics, there is just one more thing we have to take care of before we can start clustering. We have to tell the algorithm how many clusters to use. There are a couple different ways you can go about determining the optimal number of clusters, but to spare you the details we arrived at 8 clusters.
Results
Even when keeping in mind the various outliers surrounding Lamar Jackson’s college production, he is still in a cluster of his own. Malik Willis is in the cluster right below Lamar Jackson, with fellow 2022 draft prospects Dustin Crum, Desmond Ridder and Matt Corral. The tiers are ranked by each clusters EPA from rushing. The beauty that comes from clustering is the observations you can make based on the clusters and the numbers that shape the cluster. I tried my best to name the clusters in a way that would capture the foundation of the cluster in a few words.
It should also be noted that the stats used for clustering were strictly rushing statistics. Which means while players like Carson Strong, Jack Coan or Bailey Zappe aren’t in great rushing clusters, that doesn’t mean they are doomed prospects. They still have plenty to offer throwing the football.
By clustering prospects based on rushing production, we can see that a comparison of Lamar Jackson and 2022 prospect Malik Willis may not be the most apt comparison. The question now shifts too “if not Lamar, then who?”.
Similarity Index
In order to isolate Malik Willis to see which previous draft prospect his rushing production most aligns with, we will use something called a similarity matrix. Essentially we will be taking the metrics we used to cluster, find the distance between Willis production vs. other prospects production, and determining which prospect comes the closest to Malik Willis rushing production. For added clarity the numbers were transformed into a 0-100 scale, where 100 is an exact match (of course the only exact match to Malik Willis is… Malik Willis! Because we’re all one of a kind).
The closest rushing profile to Malik Willis is not Lamar Jackson, but rather former Oklahoma and current Arizona Cardinals star QB Kyler Murray. Lamar lands in the 3rd spot behind Patrick Mahomes, so there is a bit of a comparison, just not the strongest you can make. Considering the names on this list, I would say in terms of rushing Malik Willis is in VERY good company.
Conclusion
Malik Willis is the biggest wildcard for QB prospects this draft cycle. He has the tools necessary for high ceiling play in the NFL, the question is whether the negatives can be erased from his game, or will they overshadow his potential. In terms of his rushing abilities, he is on par coming out of college as some of the other top QB’s in the league. While he most likely isn’t the next Lamar Jackson (nobody will be the Lamar Jackson), he should be able to use his feet effectively in the NFL.
Just to further emphasize it, this is ONLY rushing production. Later on this month, we will put the rushing and passing together to get the best possible look at all of our QB prospects this draft. The best way to make sure you don’t miss it is if you hit the subscribe button below (its 100% free!)
If you want to dive in to the data like I do, check out @CFB_Data and @cfbfastR on Twitter, where you can learn how to get started in the world of College Football data analysis!
If you want to see more charts and one off analysis, follow my twitter page, @CFBNumbers