NPFL 3-Combine Statistical Analysis

UPDATE: PDF version of Analysis

Introduction

First and foremost: Thank you for looking at this. The interwebz is full of tantalizing material, so how you stumbled here is beyond me.

Second, but also foremost: I am an engineer, long winded, failed statistics, and suck at most things excel and grammar. I have a lot to learn and I am slowly building my knowledge base in the areas of strength and fitness. I think you are where this data finds value and is used, or ignored. Either way, the nerd in me enjoys looking at the numbers. A lot of assumptions, references to statistical smart person terminology will probably be wrong. Let me know if I am, but don’t shatter my heart. Please be nice.

Third, and definitely foremost: Use this data however you would like to, but please don’t plagiarize. My name is Noel, nice to meet you, you honest referencing person. MLA referencing is not necessary, but give me a shout out…so I can have at least more than 4 hits/month on my blog.

Name: Noel Nocas

Twitter: @noelnocas

Blog: nitrostrenth.wordpress.com

Contact me with any questions or if you would like a copy of the RAW data and the spread sheets I used.

Actual introduction

The NPFL is this pretty amazing idea to further the sport of fitness. They have done a great job on building it from the ground up and incorporating great sponsors. I won’t babble on, but I am excited about it. All this nerd stuff stemmed because I wanted to know what it took to be an NPFL athlete, or at least competitive at a combine. I wasn’t doing it for myself because I am a total badass and will probably oust Rich Froning in the games, plus be the #1 draft pick in the NPFL next year.

I am also delusional – and highly caffeinated.

With the onset of the NPFL combines I was really interested in the caliber of athlete it took to be on that stage. What a perfect stage to collect data from – top athletes from around the country giving it everything they had in a 2 day event to see if they can make a career out of this crazy sport. I love it. So I grabbed the data out of initial curiosity, then I figured out I could make graphs and do little function things in excel. I am writing this for people from a few different viewpoints:

– As an athlete performing in or wanting to participate in the combine.

– As a “coach” training athletes that wish to make it into the NPFL, or be competitive on that level

– As a programmer wishing to effectively develop an athlete into NPFL “material”

– As a competitive weightlifter/CrossFit athlete that wants to see where the competition is at

Most everything you’ll need to see is in the first table. I will also draw conclusions on how an athlete needs to perform to make it to the third day of competition. If you are interested specifically in a certain event/stat and this doesn’t cover it then I will gladly share that data with you i.e. if you want to see the average row time for people that snatched over 275lbs. Hopefully you can use this for a future combine, to train an athlete, or to program your training. So here it is.

Disclaimer: I would not make it back to the Sunday workout portion of the combine. I’m pretty average in the sport of weightlifting and CrossFit and definitely could not keep up with these guys.

How to read what I am about to puke forth:

Data is from the MALE participants from the three combines (LA, Dallas, ATL). I will also compare combine athletes to athletes selected to move on to the next stage -“Vegas athletes”. To keep this short, I will do one in-depth analysis on the Snatch event. I have all the same data for the other events, but that would make this document way too long. If you want this data for any other event please don’t hesitate to ask.

A lot of the data I collected is analyzed using basic arithmetic equations. I also will reference Standard Deviation (SD) a lot. Standard deviation is “A measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation. Standard deviation is calculated as the square root of variance.” – Source

A histogram basically collects data into “bins.” A bin is a range. For example 200 to 225 lbs., or 300 to 320 seconds, etc. In each bin/range there will be a certain number of competitors that performed within that range. i.e. 3 people snatched between 193 and 214lbs. The number of people in that bin is often referred to as frequency. For front squat:

In this case 2 people front squatted between 233lbs and 275lbs.

Most of the Bins you see are based on Standard Deviation. The first bin you will see is the average minus 3 times the Standard Deviation. Bins after that just add the standard deviation.

tiger wow Got it? Sorry, I’m trying to make it simple.

IMPORTANT: The law of large numbers that are normally distributed states that +/- 68% of the population will fall within 1 standard deviation of the average…important if you want to make it to the third day of the combine.

Because sometimes we have a small data set, we can “normalize” that data and apply it to an imaginary population set of – in this case – 2,000 people. These imaginary people show the exact same trends as the actual data (normal trend using average and SD) but it gives us prettier graphs. It also allows us to compare a population set of 48 with a population set of 140…once again with a prettier graph. It also can give us an idea of what the sport of fitness will look like as it progresses in size.

21 events, 210 athletes

210 athletes participated in three separate combines. They had the option of choosing between 21 events on the first day. Here is how they did.

List of events:

  • Front Squat – 1RM
  • Press (strict) – 1RM
  • Overhead Squat – 1RM
  • Jerk – 1RM
  • Deadlift – 1RM
  • Clean & Jerk – 1RM
  • Snatch – 1RM
  • Clean – 1RM
  • Chest-to-Bar Pull-ups – Max reps in 90 seconds
  • Handstand Push-ups – Max reps in 90 seconds
  • Muscle-ups  – Max reps in 90 seconds
  • Box Jumps (30/24″) – Max reps in 60 seconds
  • Pistols – Max reps in 45 seconds each leg
  • Double-unders – Max reps in 90 seconds
  • Rope Climbs – Max reps in 90 seconds
  • Handstand Walk – Max distance in 60 seconds
  • Farmer Carry (155/67lbs) – Max distance in 60 seconds
  • Shuttle Run – Max distance in 60 seconds

The following two stations are timed efforts:

  • 5 Rounds of 12 Power Clean and 6 Push Press (165/105lbs) – 12 min cap
  • Row 1k

    *Athletes will have eight hours to make their way to any/all of the above stations in whatever order they choose.

Event Participation

Here is how the numbers panned out for the number of athletes that participated in each event.

After doing a lot of the calculations and drawing some basic conclusions I was informed that in the [ATL] combine, athletes were informed “what coaches wanted to see on the tests. They didn’t care if you did all the events but they told us they wanted to see specifically front squat, snatch, 1k row, power clean met con, muscle-ups, and I think pull-ups was the other.” -Super secret source.

This explains the high density of athletes that performed these events. This is fine, because it gives me more data to choose from, but it adds a different element to the statistical analysis. I.e. what events are most likely to get you to the second day?

64 Athletes made it to Vegas – Here is the data:

How does that compare to all the combine athletes? As expected, averages are better across the board.

210 athletes competed in an average of 8 events.



Snatch – 1RM

As promised: the snatch data. I am a bit biased here because I like weightlifting. As mentioned above, if you would like the data for other events let me know. If there is enough demand I will put together a larger document/post with all the data.

140 participated in the snatch:

Of those 140 athletes, here are the frequencies divided into small bins:
Weight Snatched Histogram

Here the bins are selected by the standard deviation (SD).
Picture 92

Then I went on to aggregating the data normally over 2,000 “imaginary athletes” with the same average and SD. All it does is gives us a pretty graph if the data trends normally (red – below). The blue graph is the actual data (from the bin above.) This can give some great indicators on competitive athlete strength numbers.

Let’s compare all the combine athletes with just the athletes that made it to Vegas. This is normalized so we could visualize the combine athletes and Vegas athletes 1:1. The benefit of that is we can see if 2000 combine athletes went head to head against 2000 Vegas athletes, what would the numbers look like.

Picture 94

This graph is cool because it shows us that as more and more athletes are selected for future combines, we will most likely see a snatch in the 340+ range (statistically.) It also shows that the stronger weightlifters made it to the next stage, and that the density of strength was much higher in the 250 – 280lb range.

Now what… and some discussion

In my mind all these numbers basically will give baseline strength/performance indicators which can be used as goals for a developing athlete. In addition, I believe they can aid in strategy when performing in a combine – hopefully yielding in an invitation to the third day. We all ask the question “how in shape do I need to be to make regionals, or make the CrossFit games.” Similar to the open/regionals – I am asking the question about what an athlete needs to do to make it to the next stage – without the semi-randomness of the open or regionals clouding the air.

A note on self-selecting data:

“In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample [1] of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected.[2] If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.” – WIKI

This data is absolutely a biased sample; it is self-selected and is not a probability sample. This is not a population sample or a “Gallup Poll” style competition. We aren’t pulling people off the street at random here. Individuals are invited or are selectively chosen to participate, and therefore any of the data extruded from competition cannot fall into a probability sampling definition.

For example, let’s take a look at the Boston Marathon. If we were run basic statistical calculations on all the participants of the race and say the average time is X and the standard deviation from the average is Y time, then we can use those numbers to see how an individual performed, or predict how an individual will perform next year…or even the probability that they will perform in the top 10%. The only reason we can run these numbers to predict how you will do in the marathon is because people decided (who knows why) to run the marathon. Similar to the NPFL, people decide to do an event (probably those who are more competent in that area) so we can pull data from it. The NPFL is pulling in the best competitive exercisers from around the country and is doing it in a way where individual performances can be measured discretely and mutually.

Addressing those that did not do the events and how that affects the results:

Why would someone not compete in an event?

1. Other priorities (what the coaches want to see)

2. Weak in that area (less than the Average minus the SD)

3. Injury of some sort.

4. You can’t do all the events.

Let’s take a look at the snatch: 70 athletes chose not to participate. I don’t know why. But we can assume that if they had a top 10% or even an above average snatch, they probably would have done it. Adding in the numbers from the 70 athletes would probably lower the average (slightly) and it is hard to speak to the standard deviation – but I could assume it would be similar (but larger or would not diverge tremendously.)

“Noel, why didn’t you do an analysis on the players that made it to Sunday? Doesn’t this yield all your conclusions on how to make it to Sunday totally useless?”

Frankly this would have been a lot of work. NPFL did not release an official list of those that moved on to Sunday from each combine (to my knowledge.) I would have had to comb through every race summary to get some of the athlete names, but I would have probably missed most the athletes that made it to Sunday. Does this mean the numbers from the first day are “non-factors?”

I would initially agree to this on the surface. But let’s look at the numbers.

UPDATE:  My previous edit used basic statistical calculations to place a “probability of reaching the 3rd day of the combine value.” After some input from the community I have chosen to remove that data as it was very misleading. I did not know the numbers of who made it to Sunday, yet still placed a statistical number to them and spoke of them as holding truth. Thank you to those that helped me understand the mistake I made.

  • Up to 48 male competitors from each combine are invited to the next stage. This represents 144 athletes out of three combines (210 total athletes) and means that 68.6% of the athletes are invited to Sunday. So actually, more of the athletes are invited than not invited to Sunday.
  • I don’t know who was invited to Sunday out of the 210 athletes. But 68.6% were, and 67% of those athletes competed in the snatch. In comparison only 22% competed in the strict press. Does this mean if you perform the snatch you have a greater chance of making it to Sunday? No it doesn’t. Again, I have not compiled data for those that made it to Sunday (besides Vegas Athletes).
    • 75% of Vegas athletes performed the snatch, 20% performed strict press.
  • What does this say?  I don’t know really. NPFL coaches priorities, athlete priorities, strengths/weaknesses of athletes, etc… Does it mean that performing in strict press event, even near the average, is less likely to earn you a golden ticket for Sunday? No it doesn’t.
  • Remember, these are STATISTICS. They can be used to draw a hypothesis which can be tested against data. Technically it is illogical and incorrect to say if you press rather than snatch you have a lower likelihood of making it to Sunday. These are skewed because many of the athletes are doing what the coaches wanted to see.

The law of large numbers that are normally distributed (basically) states that +/- 68% of the population will fall within 1 standard deviation of the average. It just so happens 68% of the athletes make it to Sunday. Yes this is the NPFL selecting 48 people per combine to come back, and is kind of a happenstance number of sorts. However, I believe that performing within +/- the standard deviation of average for any one event is in your benefit.

So how do you make it to Sunday?

  • You get selected to perform on Sunday based on your performance. That simple.

I think you can do things to maximize your performance over the course of the combine. Your likelihood of being selected for Sunday is going to be in direct relation to how you perform.  Here is what I can offer:

  • Have fun. You have trained for this, feed off of the energy and let that propel you to PR’s
  • Smart Choices: Please don’t do 20 events. If you did make it to Sunday who knows how you would feel.
  • Be true to yourself: Make event choices that showcase your athletic potential and what you can bring to the table.
  • You may be told that certain events want to be seen more than others. If you perform these you should think highly about how each one will affect the result of the next.
  • Look at the numbers from other combines, if you think you can perform an event higher than average, or at least within 1 standard deviation below average this will be in your benefit.
  • Meet new people and develop connections. Ask for advice from other athletes or share a trick you have. Our community thrives on this. Competition is not your enemy.

SUPER IMPORTANT UPDATE:

Puppy, because you deserve it for reading all of this (and for the ladies):

puppy2

Advertisements

12 comments

    1. Justin,
      Awesome website!! Ill have to nerd out for a few hours and read your posts. Looking forward to it. You’re definitely not alone out there, and I promise I am no competitor.

      That is what the last table is for. Since I didn’t have the names of people that made it to Sunday the basic statistics say that if X% people did the snatch and Y% people made it to Sunday then X*Y% is the probability that if you did the Snatch event then you would make it to Sunday. Completely false in reality because it doesn’t take into performance characteristics. However it was quick and dirty. After my conclusion that if you perform within 1 SD or above of average you will make it to Sunday I looked at several athletes that I know made it to Sunday (all Vegas athletes and a few I knew personally). All these athletes did perform within that characteristic.

      I have all the data in between!

      Thanks,
      Noel

      1. Right; I understand the x*y% = empirical probability of reaching Sunday’s competition. I wasn’t quite specific enough because of my excitement…

        Your probabilities are very simple, but there are models that one could construct that could indicate, (1) which events are most important in qualification more explicitly by using all of the events in one go, instead of 21 separate models for 21 separate events. And (2), how individual performance in each event affects qualification probability. For instance, an athlete performs below -1 SD from the mean on event 2, but performs in the top 2% in event 3; does event 2’s performance doom him from qualifying? This would require the in-between data.

      2. Justin, great questions. lets carry this conversation offline (and by offline I mean online – by email) that way we can let the nerd stuff fly and perhaps shoot numbers back and forth. Plus I will need your help with half of it =) (just a petroleum engineer here – no formal statistics experience).

        Thanks!

  1. Cool analysis. I’m a computer/mobile/online game developer and am trying to create a fantasy system built around the NPFL — have already approached them with ideas but would like to follow up with an actual mathematical model. If you’d like to collaborate on such a project, drop me a line at @QuicksilverSoft.

  2. Any chance we can get some love for the female athletes? I’d love to see the trends on that side. (And thanks for the puppy!)

    1. Absolutely! Looking to get it done next week. There were just more male athletes to chose from so my initial response was to grab their data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s