CrossFit Meets Data Analysis

C

It’s been quite a while since I’ve updated. I’ve had lots to post about, but other things have taken priority. Today’s update is going to be a little different, but I think it’s pretty fun stuff.

Any of my readers that know me personally know that besides being obsessed with computers and research related things, my other passion is fitness and sports. In particular, I’m really into something called CrossFit. It would take an entire post to describe CrossFit in detail, so here’s my very short explanation: it’s the sport of exercise.

One of the cool things about this growing “sport”, is that there is a global competition called the CrossFit Games. The games are kind of like the Olympics, but for CrossFit athletes. To qualify, you have to compete in two different levels of competition (sectional and regional), and place high enough to secure one of only 50 spots.

This past weekend, I had several friends competing in the Canada-wide regional event. Only six men and six women from all of Canada get to go. Unfortunately, none of my friends made it (full results).

While observing the events, I got this idea about taking the results from the various events and doing some data analysis on it to see how the different athletes compare. I’ve been doing something similar in my regular research work, but for entirely different data. I decided to experiment with applying a similar clustering and statistical analysis approach, but to the CrossFit regional data!

To start with, I wanted to try to classify athletes based on the similarity of their performances across the different events. Some athletes may be great runners, but may not be super strong. So there should be different classifications for athletes and each of these classes will have distinguishing characteristics.

To do this, I downloaded the data from the CrossFit Games site and wrote a quick Java program to clean it up a bit. For my analysis, I considered each of the 5 events as different features describing a particular athlete. For example, my friend Cam Birtwell would be described by the vector [25, 145, 793, 282, 776]. He placed 25th in the run (did not have times), lifted 145 lbs in the second event, and the last 3 numbers are his times in seconds to complete each event.

Once I had vector descriptions for each athlete, I then used the Weka data mining tool to perform K-Means Clustering on the data. K-Means tries to divide a given data set into K different clusters or classes. You have to choose K up front. Since I didn’t know what a good K would be, I varied K from 2 up to 10, then used mathematical formulas to calculate the compactness and separation of the clusters I got. Based on these metrics, K=5, K=7, and K=8 appeared to be the best.

After some exploration of the data, I decided K=7 seemed the most interesting and made the most sense. This means that out of the 50 or so male athletes competing in the Canada Regional, I split them up into 7 different groupings.

Once I had this data, I wanted to know why a particular athlete, like Cam, was assigned to a particular cluster. To determine the distinguishing characteristics of each cluster, I performed multiple Analysis of Variance (ANOVA) calculations to determine the statistically significant features of each cluster. That is, for each of the five features/event, I performed ANOVA to check for statistical difference in that particular feature when compared across all 7 clusters.

For most events, there was statistically significant (p < 0.05) differences. The event with the least difference was event 4, the double-under, burpee event (p = 0.03). This was such a fast event, it seems like every male athlete was quite competent at these skills, so there was little variance.

ANOVA simply told me that there was statistically significant differences between clusters, but not where these actually occur in the clustering. To determine this, I used the Tukey range test, which compares all pairs of means for a given independent variable (i.e. the event scores). Using these statistically interesting differences, I summarize the cluster characteristics below.

Cluster Characteristics
Cluster 1 Fast runner, not a strong lifter, slow through event 3 and 4, and average in event 5.
Cluster 2 Fast runner, average lifter, fast through events 3, 4, and 5. All around athlete, with speed focus.
Cluster 3 Average runner, heavy lifter, fast through events 3, 4, and 5. All around athlete, with strength focus.
Cluster 4 Slow runner, weak lifter, slow through events 3, 4, and 5.
Cluster 5 Average runner, low to medium lifter, slow in event 3, fast in event 4, and average in event 5.
Cluster 6 Slow runner, medium lifter, slow in event 3, fast in event 4, and average in event 5.
Cluster 7 Slow runner, average to heavy lifter, average in event 3, fast in event 4, and average in event 5. Strong, but needs to work on running.

The following table gives a break down for each cluster, the highest placement for an athlete within that cluster, the lowest placement, the average and the size of the entire cluster.

Cluster Highest Lowest Average Size
Cluster 1 24 45 35 4
Cluster 2 1 23 11 13
Cluster 3 6 15 10.83 6
Cluster 4 42 49 46.4 5
Cluster 5 13 43 28.6 9
Cluster 6 29 41 36.4 5
Cluster 7 20 40 30.3 7

That’s the raw data, but is there anything interesting we can say about this? It turns out there’s a number of interesting results.

First, the top 13 athletes all fall into cluster 2 or 3. Also, from the top 6 athletes (the ones going to the games), 5 of the 6 are from cluster 2, while one is from cluster 3 (DJ Wickham). Based on this, I’d guess that the 5 events favored an all around athlete with speed more so than an all around athlete with a strength focus. If you look at the events, most did not involve heavy weights. It could be that with a different balance in the events, for example, a 165 clean in jerk in the last event instead of 135, the cluster 3 athletes may have been more likely to dominate the top 6.

One very obvious result from this is that you clearly need to be a great all around athlete to crack the top 15. With 5 events, your weaknesses are going to get exposed. Everyone in this competition is most likely a great CrossFitter, so it’s extremely difficult to come back from even one poor score.

I think it would be interesting to take this analysis further by looking at clustering data from other regionals and seeing the similarities between the athletes that made the games. It would also be interesting to use the analysis to compare the programming for the various regionals. What regions favor certain types of athletes?

Anyway, that’s all I got for now. Hopefully some people find this interesting and I hope no one that was included in the analysis is offended by my categorization. There’s some subjectivity to the whole thing, so don’t take it too seriously :-).

About the author

Sean Falconer

15 Comments

  • Wow that's a neat look at the Regional Sean, nice work! I always find it interesting to see what type of athlete is selected for by the events within a given competition.

    Looking at the athletes from the Canadian Regional, 7 out of the top 9 were 6' tall or above. Garth is about 5'9" but has amazing aerobic and anaerobic power. This seems contrary to most CF competitions where shorter guys (5'8-5'10) seem to be the most dominant.

    It looks also like the tire flip – c+j – run WOD was the most predictive of overall finish but that was also the only WOD in which people were grouped according to their overall standing at that point. Being in a faster heat with prior knowledge of what constitutes a good time in an event can definitely push towards better times.

    It was also interesting to me that a guy who finished well back in the run (DJ) could still make it in by beasting through the other workouts.

    Anyways, cool look at the numbers, I like this sort of stuff!

    cam
    PS where do I fit in cluster-wise?

  • Hey Cam,

    It would be interesting to take into account other features to make the clustering better, but I just worked with what I had. Like, I could take into account height, weight, perhaps scores on benchmark workouts, etc.

    Maybe part of the reason for the height differential is the make-up of the WODs. Wallballs and tire flips typically favor a taller individual. There wasn't really a typical exercise that shorter guys dominate, like air squats.

    DJ was pretty amazing to come back from 38th. I guess Garth and Michael had similar come backs from the snatch complex, but not quite so dramatic. Those three are perfect examples of elite CrossFitters with slightly different strengths. Michael and Garth have speed and endurance (1 and 2 in the run), while DJ has the edge on power and strength.

    You were in cluster 5 and quite obviously, Lucas was in cluster 3.

  • This is really, really cool. Being tall and slim, I always am interested in the relative wod difficulty or bias correlated to size/shape. Really interesting to correlate it on the endurance/strength bias generally. I'm waiting for my optimized wod on MFT šŸ™‚ Rowing, wallballs, tire flips and deadlifts or something like that. Not an air squat or HSPU in sight !!

  • I'm going to nerd out here and just note the similarity between your clusters and RPG character traits.

  • Hey Sean, if you get a moment, could you explore a different scoring system? I have no statistics background besides Stat101 my first year of undergrad, but Stats and Scoring are both pretty logical so I can understand them for the most part.
    Anyway, I have an issue with both Rank-based scoring (doesn't reward margin-of-victory or defeat) and Proportional scoring (some events have tighter distributions than others — not necessarily because the athlete's are more equal, but because the event just yields tighter distributions). My solution to that was to make a pseudo-proportional system that is based on the Median score, but takes into account Standard Deviations.

    The problem I noticed with that is that an outlier might throw things off — if 10 competitors deadlift between 400-600lbs, after normalizing the event to a median of 100, the difference between first and last might be 60 points. But, if an 11th competitor deadlifts 1000lbs, the difference between the 400lb Deadlift and the 600lb lifter might shrink to around 25. My solution to this is to ignore outliers when determining the Standard Deviation.

    I would be interested to see if you had any advice on how to change things, or any instances where the scoring system does not accurately judge performances.

    It is attached here:
    http://www.board.crossfit.com/attachment.php?attachmentid=6661&d=1280165725

    Thanks so much.

  • Hi Justin,

    So what you describe is quite similar to the Standard Score (http://en.wikipedia.org/wiki/Standard_score). I talk a bit about this in this post: http://seanfalconer.blogspot.com/2010/06/crossfit-alternative-perspective.html

    The major difference is that you are throwing out outliers. Now, this is a pretty standard data analysis technique, however, there's tons of controversy about how to detect an outlier and whether they should indeed be removed or not. I see you are doing something with quartiles to figure out the outliers.

    In data analysis, often the general rule is if the outlier is there but shouldn't be (i.e. recording error, bad data, etc), it should be removed. But when the outlier is legitimate, it's much more controversial whether it should be removed or not.

    There are other ways of dealing with outliers besides removal, like transformations on the data or bounding the value of elements by replacing extreme highs and lows with more sensible values of high and low. However, these methods are all application specific.

    Now, I'm not really answering your question with all this, but I guess my point is these things can be tricky :-).

    From a casual observer point of view, I don't see why something similar to your approach wouldn't work. One argument against it from CrossFit HQ's point of view might be the lack of transparency. The rank-based system, although flawed, is simple to understand.

    I'd love to see the people in charge of the games consider some alternative methods rather than blindly insist that the rank-based system is "perfect" (see http://games2010.crossfit.com/blog/2010/07/scoring-games,707/)

    Interestingly, if you look at this year's games data for different scoring systems, in a proportional, lowest converted points, and standard score system, Spealler wins.

    Check it out:
    http://keg.cs.uvic.ca/seanf/crossfit.php

    Cheers,

    Sean

Sean Falconer

Get in touch

I write about programming, developer relations, technology, startup life, occasionally Survivor, and really anything that interests me.