Q: What is SPARQ?

Check out the documentation page I’ve put together.

Q: What is simScore?

Bill James, the man generally credited with the start of the sabermetric movement in baseball, is behind the idea of a similarity score. The wiki page does a good job of explaining the basics, and Baseball-Reference talks about the calculation of their similarity score.

I developed a similar idea, but for a SPARQ-based similarity score applied to physical measureables of NFL draft prospects. The idea behind this is that specific athletic profiles have different rates of success. It’s one step further than SPARQ — instead of the only question being “How athletic are you?”, it asks “What kind of athletic are you?” Of course, SPARQ figures into the calculation. The full list of inputs: height, weight, arm length, forty-yard dash, ten-yard split, short shuttle, 3-cone drill, bench press, vertical jump, and broad jump.

Q: Does SPARQ take height into account?

No. The list of inputs for my NFL version of SPARQ, pSPARQ, are as follows: weight, forty-yard dash, ten-yard split, short shuttle, 3-cone drill, bench press, vertical jump, and broad jump.

Q: Well, shouldn’t SPARQ take height into account?

With any metric, you have to establish bounds. It’s difficult to create something that can measure everything, because each added layer of complexity further obscures the result. SPARQ is about a few different concepts — how fast can you run, how high can you jump, and how quickly can you change direction. Weight is a critical factor in all of these — your momentum is different in the 40, power generated is different in the vertical and broad. SPARQ is a weight-adjusted metric because weight directly impacts the test results.

Height does matter, and a taller athlete may be preferable. My personal preference is to do a good job of adjusting for the weight and tests, and then interpret the result in the context of each player’s physical build.

I would guess that the tall, lanky receiver is probably SPARQ’s biggest blind spot — it doesn’t appreciate that tests are harder for Sidney Rice or Josh Gordon to perform than they are for Brandin Cooks or Bruce Ellington. Metrics don’t need to be perfect if we do a good job of understanding what they’re saying and what they miss.

Q: How do you decide which results to use when a player participated at both the combine and his pro day?

Well, there’s no perfect method for dealing with this issue. The answer is that I take the better result and combine the two performances. So, if a player jumps well at the combine and runs well at his pro day, he gets credit for both and the poorer results are washed out.

Now, this does lead to a little bit of inequity. A player’s pro day is a friendlier environment than that which is encountered in Indianapolis. Times are a little faster and jumps are a little higher.

It isn’t a perfect system to use the best-of composite; however, it likely does well enough for our purposes, even if we overrate a player’s speed from time-to-time, as might be the case with Anthony Barr. We’re not building a rocket ship, and a reasonable change in any one category isn’t going to result in a drastic shift in SPARQ.

I’m not in favor of throwing out pro day data for a few reasons. First, many players don’t participate in every event at the Combine. It’s necessary to add in a result here or there just to the data necessary to complete a pSPARQ calculation.

I also don’t understand the concept of throwing out data because it doesn’t occur at the same site. It’s data. We can interpret data, perform studies on data, and adjust data. I’d rather try to establish what pro day data means than throw it out altogether. I’d estimate that 80% of the SPARQ results in my database use pro day data. One of the goals I have with SPARQ is to find the late-round SPARQ values, and those guys aren’t even at the combine.

Q: Stephen Hill, Vernon Gholston, and Greg Little were all great athletes. Why do you promote SPARQ when there are so many obvious misses?

It’s a fair question. There are a vast number of players who have tested out well but failed to make an impact in the NFL. The issue here is hit rate. While Wes Welker had a successful career as a marginal athletic tester, there aren’t very many Wes Welkers around. It’s hard to identify that player, even for the best of scouts. I prefer to take my chances with the better athlete, and the data bears this out: good players also tend to be good athletes.

Q: Aren’t you just identifying the tall and fast players? I don’t need a formula to tell me which players are tall and fast.

To some extent, there’s truth here. Fast players are often good athletes!

Still, it’s not really accurate. SPARQ doesn’t even take height into account, as discussed earlier. What SPARQ does for us is dial back the influence of the 40-yard dash.

I look at combine data constantly, and I’d still stutter a bit if you asked me about a given defensive lineman’s broad jump. Because none of us are really great at understanding the bounds of these values, we fall back on the results we do understand. This means the 40 time is emphasized to great lengths and plays a tremendously large part in shaping the general opinion of a prospect’s athletic potential.

By using a metric like SPARQ and normalizing by position, we can weed out the “40 bias” and have a more holistic understanding of each prospect.

It also isn’t just about tall and fast. By looking at player profiles and success rates, we can determine that some prospects aren’t worth their projected draft position, even considering their plus-athleticism. Similarly, some players who test on the lower side of the scale tend toward an athletic profile that is often successful.

Q: Aren’t the tests imperfect? Times are subjective and depend on the environment/timer.

Yes, they are absolutely imperfect. This is where error dilution helps. As discussed earlier, we can see some differences between the combine time and pro day time for a given player. While some of this may be explained by a friendly stopwatch operator, the reality is that we all have good days and bad days. The Central Limit Theorem tells us that our test results will form a normal distribution. Sometimes we do well, and sometimes we don’t. Most of the time, we’ll fall somewhere near the middle (i.e. our actual ability), but outliers happen.

A composite metric dilutes the importance of any one test. A botched 40 start can’t submarine your SPARQ entirely, and an artificially great one doesn’t make you Calvin Johnson by itself. While each of the test results may be individually flawed, SPARQ helps us to weed the inherent error out.

Q: How do I know when a player’s SPARQ is good? What’s bad?

I believe that we need to relate a player’s athletic ability to their positional average. Vince Wilfork ran a 5.08, but weighed 323 pounds and managed to broad jump 8’5″. He’s a great athlete relative to other nose tackles, but his numbers look appalling next to pretty much any non-Jarvis Landry wide receiver

To this end, I’ve calculated positional pSPARQ statistics which allow us to calculate the player’s standing relative to peers in terms of z-score. A 0 z-score would mean a player is average, while a 2.0 would mean he’s two standard deviations above the peer average. If you’re more comfortable with percentiles, there are online calculators (specify a one-sided distribution) which allow for conversion from z-scores.

As of February 2015, z-scores are calibrated to the league average athlete. This means that a 0.0 z-score for a draft prospect represents the 50th percentile on an NFL roster.

Q: What is UQI?

UQI is a measure of how unique a given profile is. I did a brief write-up with the pertinent details: Similarity Scores and Uniqueness Index.