Saturday, April 19, 2014

A Look Back at the 2014 Open: Part I

As I did last year around this time, today I'll be starting my review of the 2014 CrossFit Games Open. Of course there will be limitations to what this analysis will be able to cover, partly due to the data I'm able to get at this point (I am hoping to eventually pull down all the individual stats, like Fran time, max deadlift, etc.). Still, I think there is enough data out there to help us further our understanding of the current state of our sport and where we may be headed. Like last year, I'll be breaking this post up into two parts. To start, here is a list of topics I plan to cover, followed by a list of things I will not be touching on in this post:

Will cover:
  • Breakdown of the programming of this year's Open, much like my "What to Expect from the Open" posts from fall 2012 and fall 2013 (Part I)
  • Correlations between events this year, compared with last year (Part II)
  • Comparison of performance by new competitors vs. returning athletes (Part II)
  • Comparison of 11.1 and 14.1 results (Part II)
  • Attrition in this year's Open, compared with past years (Part II)
Will not cover:
  • Comparison between regions (don't have region information on the data at the moment)
  • Breakdown by age group (don't have age information, either)
  • Predictions for regionals (coming in the next few weeks)
  • Probably lots of other subjects that I simply didn't think of. If you have suggestions for future analysis, by all means, post to comments or email me.
Finally, here are some notes on the data set I am using for any work dealing with the results of the Open (thanks again to Andrew Havko for helping me pull this data, along with the 2011 Open results, which I've been wanting to get my hands on for some time):
  • Excluded any athletes who did not complete all 5 events. This simply makes for fairer comparisons. I did look at all scores in order to calculate the number who dropped off each week, but that is it.
  • Masters competitors (54 and under, since older groups are scaled) are lumped in with everyone else. As mentioned above, I don't have age information this dataset since I pulled it straight off the worldwide leaderboard.
  • I have re-ranked athletes on each event among the athletes in this dataset.
  • Athletes were identified as returning athletes if their full name was in last year's dataset. There are multiple athletes with the same exact name, but I had no way around this without region or age information. I assume any impact here is minor. The one manual fix I made was to make sure the Ben Smith at the top of the leaderboard was matched up with the correct Ben Smith from last year's data. 
So let's get started.

We'll start with the programming this year. I generally liked the programming this year (although it certainly didn't play to my strengths), mainly because HQ finally threw us some curveballs and some workouts that I didn't expect. Among the things we saw this year that hadn't occurred in previous Opens:

  • Rowing
  • A workout for time (rather than AMRAP)
  • A workout with more than 3 movements
  • Weights over 300 pounds for men and 200 pounds for women
  • Pull-ups and thrusters not in the same workout

Having said that, the Open is still the Open, and many things remained the same or similar as prior years. For instance, the loading was still much lower than the Regional and Games level. In fact, by my measurements, this was the lightest Open yet. Below is a basic comparison of the average loading* used each year in the men's competition (the pattern is the same for women).

The average relative weight was down from all prior years and there less than 50% lifting, which meant that the load-based emphasis on lifting (LBEL) was down about 10% from the historical average**. An investigation for another day is whether this Open favored smaller athletes because of that lower LBEL.

Now let's take a look at which movements have been used across the three years, and how they have been valued. This is presented slightly different than last year: each value represents how much that movement was worth as a percentage of the total for that season. The reason for presenting it this way (as opposed to counting total events) is that 2011 had six events and the other years had five, so this accounts for the fact that each event was not worth as much in 2011.

We see that this year, the programming hit almost every movement that has been used at any point in the past (push-ups and jerks were the exceptions) and added one new movement (rowing). Not surprisingly, we see the same movements being emphasized as in prior years: snatches, burpees, thrusters and pull-ups. One interesting note is that this is the first year in which no movement has accounted for more than 10% of the total points. However, the caveat here is that this methodology assumes all movements in a given workout are valued equally. In reality, there are instances where this is not necessarily true: for instance, most people would agree the deadlifts were valued far more than the box jumps in 14.3

As in past years, you'll also notice that the Olympic-style lifts and derivatives (thruster, overhead squat), as well as basic gymnastics movements, were the biggest keys to Open success. However, we did see a bit more value placed on other areas, such as powerlifting (e.g. deadlift) and pure conditioning (double-unders, rowing). Although Castro did surprise us with a few things this year, I still think it's a safe bet that the more advanced movements you might see at the Games (e.g. ring handstand push-ups, heavy medicine ball cleans) are not going to be tested in the Open. That's not to discount the usefulness of these other skills in training; it's just that you're not likely to see that tested until at least the regionals.

Finally, here's a chart I put together showing the relationship between loading, the number of movements, and the length of workout in the past three years of the Open. This chart was shown last year, but I have added the 2014 workouts, which are represented by the red balls. In the chart below, the x-axis represents the time domain, the y-axis represents the number of movements*** and the size of each bubble represents the LBEL of that particular workout (roughly how "heavy" was each workout). A plus-symbol indicates the weight varied during the workout and the arrows indicate the time varied for the workout****.

This year's workouts, although unique in how they were programmed, still didn't stray too far from what we've seen in the past in terms of loading and time domain. Keep in mind that for the above chart, I'm using averages for the variable-weight event (14.3) and variable-time events (14.2 and 14.5). For many CrossFitters, they did see a very long workout in 14.5 (which took many people beyond 20 minutes) and a very short workout in 14.2 (which only lasted 3-6 minutes for a majority of the field).

You can still observe some general trends here, which are often true of CrossFit programming in general. The shorter workouts tend to involve fewer movements and can occasionally go heavy, while longer workouts can potentially involve 3 or more movements but generally have light-to-moderate loading.

That's it for Part I. In Part II, which I expect to be out this week, I'll be focusing more on the results of this season's Open. See you soon.

*For background on these metrics, please see my post "What to Expect from the 2013 Open and Beyond." You may notice that the loading for prior years has changed slightly, which is due to me updating the relativities between lifts as I gather more data.
**For any workout with a variable element, such as the weight in 14.3 or the time in 14.2 or 14.5, I used the average of the top 1000 athletes. This is consistent with prior years.
***I considered the 11.3 a single-modality for this chart even though it technically included cleans and jerks.
****Workout 13.5 is actually hidden from view here because 14.3 is covering it up (same time domain and number of movements). Workout 13.5 was also a variable-time workout and would have had the arrows on the ball.

Tuesday, April 1, 2014

Can Mid-Week Projections Work?

Two weeks ago, I proposed a method to project an athlete's overall ranking before score submissions had closed for the week. To me, it made sense on paper, but it was admittedly untested. So I put out a request for help on testing it in week 4, and thanks to Andrew Havko (among others), I was able to make that happen.

So can it work? It appears that it can. That's not to say the projections are 100% accurate, and they are far from precise very early each week. But I think it's clear that the projections can give an athlete a good sense of where they would likely finish the week if they stick with their current score, which is something that is nearly impossible currently.

I tested these projections at three points during week 4: Friday 8 a.m., Saturday 5:30 p.m. and Sunday 3:30 a.m. (all EDT). The method requires one key assumption, which is the percentage of athletes who will drop off from the prior week, and for this I used 10%. Certainly this would need a bit more careful thought if it were to be implemented by HQ.

For each athlete, I projected their overall worldwide ranking at each of these times. For athletes whose score did not change by the end of the week, I compared my projection to their ultimate ranking. In total, the error of my projections were as follows:
  • Friday 8 a.m. (<1% of field reporting) - 9,575 mean absolute error*, 9,404 mean error
  • Saturday 5:30 p.m. (16% of field reporting) - 1,003 mean absolute error*, -787 mean error
  • Sunday 3:30 a.m (21% of field reporting) - 1,454 mean absolute error*, -1,362 mean error
Interestingly, the projections (at least using this first basic method) got slightly worse overall from Saturday to Sunday. The reason is that the distribution of scores submitted by Saturday 5:30 p.m. was more similar to the ultimate distribution than on Sunday. What I found was that, in general, the scores submitted very early on during the week are well above average, and the quality slowly declines throughout the week.  That is until Monday evening, when a slew of athletes replace their first score with a second improved submission. It turned out in this case that Saturday afternoon was a pretty accurate indication of how the current week's scores will turn out.

However, let's look a little more closely at the errors. Although an error of 1,003 (our best mean absolute error) is pretty small for an athlete finishing, say, 40,000th, it would be a very large error for an athlete finishing 2,000th. Thankfully, the size of the errors generally increased as the ranking increased. Below is a chart showing the percentage error for athletes across the spectrum of rankings, using our Saturday afternoon projections.

So you see that generally, we never really stray further than 3% error at any point. That's not too bad when you consider that there's currently no way to get even a good ballpark estimate until at least mid-day Monday.

Still, maybe we can do better. What if we had actually used the perfect assumption (8% in this case) for the percentage of athletes who would drop off from the prior week?  Well, in total, we improve for our Saturday and Sunday projections, with the mean absolute error going down to 338 for Saturday and 581 on Sunday. Interestingly, though, in this particular case it doesn't necessarily improve the projections across the board for Saturday and Sunday. Below is the same chart as above, but with the perfect assumption for attrition.

Although our error gets a little worse near the top, once we get near the middle of the pack, these projections are nearly spot-on. And even near the top, a 5% error isn't that bad - that's like these projections putting Josh Bridges at 100th overall, whereas he actually finishes 105th.

One way we can theoretically adjust to get even closer is to make an adjustment for the skill level of the atheletes who have submitted scores at a given point. This could involve looking at the average ranking of the athletes from their prior week's scores and comparing that to what we'd expect by week's end. The trouble is, it's challenging to know what the level will be at week's end. You might expect that the field would average out to be at the 50th percentile in prior weeks, but that wasn't actually the case here. The average athlete submitting a score for 14.4 was actually about the 48th percentile in prior weeks, which is due to the fact that the athletes dropping out after 14.3 were generally from the bottom of the pack.

My point is that while such an adjustment is possible, it might not be practical. And considering the projections even with my base 10% attrition assumption weren't too bad, I don't think further adjustments are necessary, beyond refining that attrition assumption to make it as accurate as we can.

Finally, while I think this method would produce reasonable results if implemented by HQ next year, there are some caveats about the testing done here:

  • I've only done testing for one week. There may be more (or less) error if we made these projections in week 2 or week 5.
  • I'm almost certain that the percentage error would increase a bit if we do this for each region. The sample size is much smaller, which means that even if the same principles apply, we're likely to see more variability. For one thing, it's going to take longer each week before the projections are even remotely meaningful, since many regions had less than 100 entries until late each Friday afternoon.
  • I only tested this for the men's field. I don't see any reason why the results would be much different for women, aside from the field being smaller, which would likely increase our percentage error a bit.
All that being said, I feel that implementing this method would provide a realistic glimpse into where an athlete will wind up. As long as athletes understand that this is merely an estimate, the information provided can be quite useful. 

Would this revolutionize the sport? Of course not. But I think it would be yet another improvement to the athlete experience as the largest stage of our sport continues to grow.

*Mean absolute error is the average of our errors, if we ignore the direction of the error. So if we are off by -500 for one athlete and +500 for another, the mean absolute error is 500 but the mean error is 0.

Tuesday, March 25, 2014

Fun with SWAGs: What Will 14.5 Be?

See? I told you. I knew last week was the week. For the first time in 8 tries (dating back to last year), I put up a SWAG that was pretty close to spot-on. Sure, I missed the row (who saw that coming?), but the combination of wall ball, toes-to-bar, 135/95 cleans and muscle-ups was pretty much spot-on. Will it ever happen again? Eh, probably not.

As for this week, things should be pretty straightforward with just a few movements left on the table. With that in mind, a few quick thoughts before we get to the pick:
  • I've really enjoyed the programming so far this season.  Aside from this past week, it has not been in my favor, but still, I like that they have been willing to get creative and take some chances with the programming. They've also avoided any judging disasters like they had last season with 13.2.
  • Having the video requirement for regional qualifiers seems to have increased the number of videos available for top athletes. This has been nice to see, as it's highlighted how completely legit most of the top athletes are. When I watch a video of a Games-level athlete, the reps are all crystal clear and the range of motion leaves no doubt (except in the head-to-head Open announcement WODs, which seem to have had some questionable reps, in my opinion).
  • I have to say, I'm not sure why HQ felt they needed to add a row this year. In any other setting, I would have no problem with this, but if they really want the Open to be "the most egalitarian sport in the world," requiring the use of a $1,000 piece of equipment doesn't really make sense. Sure, this only impacts 0.1% of the athletes in the Open, but really, was it necessary?
  • I'll have more on this later this week, but at first glance, it appears that my method for mid-week projections worked fairly well last week. Many thanks to Andrew Havko for pulling the data for me at several points this weekend, and thanks to the several others who offered up their services for helping me pull the data.
OK, let's get down to brass tacks. We all know what's left on the table for 14.5: burpees and thrusters. Yes, jerks are left as well, but I don't see them skipping burpees or thrusters, and I doubt they'll put jerks in a workout that already has thrusters. So I'm going to assume it's a couplet of burpees and thrusters.

Weight-wise, I think this needs to be relatively heavy, considering the LBEL and average weight load are still below their historical averages. As far as the time domain, I think they'll keep it short. We have been hovering around 10 minutes throughout the Open, but considering last week was 14 minutes and we're likely to have only two movements this week, I can't see them going long for 14.5.

Bearing all that in mind, let's get to it. My SWAG for 14.5 is:

AMRAP 7 of 3 bar-facing burpees, 3 thrusters (115/75), 6 bar-facing burpees, 6 thrusters, 9 bar-facing burpees, 9 thrusters, ...

I hate putting down something that looks so familiar, but we've seen that 7-minute, 3-6-9-... pattern in each of the past three Opens, so I feel like I had to go with it again this year. I figure the 115/75 thrusters are a bit of a diversion from prior years where it has typically been 100/65, but yeah, this one is kinda dull. Let's hope Castro proves me wrong.

Post SWAGs to comments, and good luck to everyone on 14.5!

Tuesday, March 18, 2014

Fun with SWAGs: What Will 14.4 Be?

For 14.1, I whiffed on the movements but at least got the time frame right. For 14.2, I did manage to call the overhead squats but basically missed on everything else. Last week? I pretty much got no part of 14.3.

But this week? THIS IS IT. I can feel it.

Obviously, things should be getting easier for us, since fewer movements are left on the table. Here is what we have left, in order of emphasis in past years: burpees, thrusters, jerks, toes-to-bar, cleans, muscle-ups, wall balls and push-ups. Of those, I'd expect them all to come up at some point, with the exception of push-ups. So that means we probably have a triplet coming up, or possibly even a workout with 4 or more movements, which has never occurred in the Open before.

You may have also noticed that the complexity has increased each week so far, along with the weight level. In fact, the last two weeks have been among the most unique workouts in the four years of the Open. My theory is that this trend will continue this week before HQ releases something simple and classic to close things out on 14.5. So I'm going to leave a couple classic movements on the table for 14.5: burpees and thrusters.

I also feel like they need to go long this week, since they really haven't gone beyond 10 minutes for all intents and purposes so far (on 14.2, the athletes lasting beyond 10 minutes probably were cruising for the first 3 or 6 minutes). Weight-wise, this could go either way, since the LBEL is very close to the historic average at this point. So for 14.4, I'm thinking something long and funky, with a bunch of movements.

With that all in mind, here's what I got for 14.4:

AMRAP 15 of 60 toes-to-bar, 60 wall balls, 30 clean and jerks (135/95), 30 muscle-ups

I wouldn't waste your time with your own pick. I just nailed it. Sorry, I know you wish you had thought of it first. Post congratulatory remarks to comments.

Monday, March 17, 2014

A Method to Project Overall Open Rankings Mid-Week

One quirk about the Open leaderboard is that while a workout is open for submissions, the overall rankings are basically useless. The rankings for the current week's workout are obviously understated, but as I explained in my previous post, you can at least get a decent sense of where the score will end up by looking at the percentile rank at any point in time. However, with the overall rankings, they are screwed up because the most recent week's rank is so understated in relation to the prior weeks' scores. For instance, if an athlete who was in 300th place in each of the first two weeks but posts the best score in the world early in week 3, he will still appear behind an athlete who finished 290th in each of the first two weeks but is currently 10th of 100 entries in week 3. But we know that by week's end, there will be much more separation between the athletes in their week 3 ranks, which will place the first athlete well in front.

So is there a way we can get at an accurate projection of an athlete's overall ranking mid-week? I think we can, but not without a little bit of work.

The idea is this: since we can reasonably project the ending percentile ranking for the current week's workout, we should be able to reasonably project the ending rank, if we make an assumption about how many athletes will complete the workout. If we can get that projection for any particular athlete, we should be able to do that for all athletes who have completed the workout. At that point, we can re-rank those athletes based on projected total points. Using that, we can basically "scale up" those ranking based on how many athletes we anticipate will complete the current week's workout.

More specifically, here is the process I am proposing:
  • Compute each athlete's percentile ranking for the current week (either overall or in the region) based on the athletes who have currently submitted scores
  • Based on the number of athletes in contention at the end of the prior week, reduce that by some factor (say 10%, which is near the historical average) to get an estimate of the number of athletes who will remain at the end of the current week
  • Multiply the athlete's percentile ranking by the estimated number of athletes who will remain at the end of the current week to get the projected rank for the current workout
  • Use these projected ranks to get a projected overall point total at the end of the current week
  • Re-rank the athletes who have submitted scores based on the projected point totals
  • Convert the projected rankings to a percentile rank based on the number of athletes who have currently submitted
  • Multiply this percentile by our earlier estimate about how many athletes will remain at the end of the current week. This will give you each athlete's projected overall rank at the end of the week.
To accomplish this, all we would need a snapshot of the full leaderboard at a given point in time. I do not think it is possible to accomplish this even for a single athlete without making the calculations for all athletes. However, with the right computing power, it would be a relatively painless calculation to generate the projected overall rankings. Obviously HQ would be in the best position to perform these calculations, but I think it is conceivable that someone on the outside could do this as well.

This is all theoretical at the moment - a decent amount of testing would be necessary to make sure this process actually produces reasonable projections. Still, I think the concept is something that could be used to improve the Open experience for all of us.

Note: If anyone out there has the resources and the know-how to get a hold of the leaderboard mid-week and get it into Excel or .csv, I'd be very interested to test this out. If so, post to comments or email me directly (

Tuesday, March 11, 2014

Fun with SWAGs: What Will 14.3 Be? (And More)

Before we get SWAG-y, I'd like to get some quick thoughts out about 14.2 and also give you the results of an interesting little study I did last week.

Without further ado, a few thoughts on 14.2:
  • Really cool to see a new WOD format for the Open.  We all know the Open has typically been filled with very basic, classic CrossFit WODs, so it was refreshing to see a workout like this come out. I think it took us all a couple of days to figure out the right strategy for this one.
  • Personally, I would have liked to see the overhead squats either progress in weight or start a little bit higher. For the top-tier athletes, this was pretty much all about the chest-to-bar pull-ups. When Castro announced that the rounds would change in each 3:00 segment, I was really hoping to hear him say the weights would increase. But still... cool workout.
  • Good to see that they set the breakpoints such that more athletes were able to get past the first round, unlike 13.5. Based on a really quick look, it looks like about 70% of the men reached the second round and about 30% of the women reached the second round.
  • Looks like we saw quite a few people drop out in week 2. About 15% of the men and 15% of the women dropped out. That was a bigger percentage drop than we saw in any week last year, except for the women between 13.3 and 13.4 (19%). [UPDATE 4/12/2014 - This previously said the largest drop was between women's 13.2 and 13.3. That was a typo.] 
Also last week, I decided to try something new. I picked out two athletes who submitted scores by 9 a.m. on Friday morning, and each 12 hours after that, I tracked their progress on the leaderboard. My goal was to understand how the ranking of a particular score will evolve over the course of the weekend. Obviously an athlete's position would continue to get worse, but would his or her rank decrease (or increase) as a percentage of the field? If not, then an athlete could quickly get a sense of where his or her score might end up simply by checking out the percentile rank early on.

So let's see what happened. These athletes each were ranked in the 90th percentile at the start of the weekend, and I tracked the rank of the score, not necessarily the athlete. If the athlete retried the workout, their new score did not count towards this study.

Looking worldwide, we can see that in fact, the athlete's percentile stayed relatively stable. The men's percentile improved slightly, which I think is actually more typical. The women's score I picked was 87 (one rep away from finishing the 12's), and I suspect that a lot of women worked just hard enough to get that past hump on their second try.

Quickly, let's take a look at the progression in the region, which is generally more important for most of us.

Here we see that the athletes actually started at a lower (worse) percentile in the region. In the North American regions, this is probably common, since the only people performing the workouts that early on are generally strong athletes. In the rest of the world, those scores early on could come in normal Friday classes. But by 9 p.m. Friday, the athlete's percentile was pretty indicative of how things would finish up.

Anyway... on to the SWAG. Briefly, here is what we know:
  • The movements left on the table (in order of how much emphasis has been placed on them in the past three years): burpee, thruster, jerk, toes-to-bar, box jump, muscle-up, wall ball, clean, deadlift, push-up.
  • The LBEL so far is 0.38 for men and 0.27 for women. That's fairly close to the historical average for women (0.30), but below the historical average for men (0.46). So I think it's likely that things will get a bit heavier over the final three workouts than they have been to this point. That may or may not be true for 14.3 specifically, though.
  • Neither workout so far has been particularly long. 14.1 was 10:00, and 14.2 had at most 9:00 of challenging work for any particular athlete, when you consider that the early rounds were nothing more than a warm-up for the top athletes.
  • The first two workouts have been couplets. In 2011, there were three triplets, and in 2012 and 2013 there were two triplets each, so I think it's fair to assume we're getting a triplet soon. There were no single-modality workouts in 2013, and I doubt we'll see one this year.
With that as our framework, here we go with the SWAG for 14.3:

AMRAP 16 of 8 thrusters (115/75), 10 toes-to-bar, 12 bar-facing burpees

Again, you can pretty much take it to the bank that I will be way off. But it's not like you can do better. Of course, if you think you can, post SWAGs to comments. Enjoy 14.3 everyone!

Tuesday, March 4, 2014

Fun with SWAGs: What Will 14.2 Be?

One week in the books. I don't know about you, but I am once again shocked by the jump in competition from last year. I know personally I feel fitter than ever, but I find myself lower on the leaderboard than in past years (~780 in the Central East). It's tough seeing the top of the leaderboard get further and further away, but it's also reassuring that the sport is headed in the right direction. With 6,500+ men in the competition here in my region, only the absolute best of the best will even make it to Regionals. Marcus Hendren, top 10 in the Games the past two years, is currently outside the top 100 in his own region. This is where we are these days. For the rest of us, there's nothing to do but work hard to keep up.

Anyhow, let's move on. I may not qualify for Regionals, but I still have a chance to fulfill another lifelong dream: correctly predicting an Open workout the week beforehand. Last week, with basically nothing to go on, I wasn't particularly close, except for the 10-minute time domain. But I have a feeling is my week. NOW IS THE TIME.

OK, so let's get started. Let's assume that snatches and double-unders are off the table (even though half of the field probably used clean-and-jerks instead of snatches). Let's go ahead and assume they'll hold thrusters and pull-ups until the end again (although I really hope they don't). I also don't think they'll put cleans or jerks in this week, since they were an option in 14.1. That leaves us with the following movements on the table, in order of how much emphasis was put on them in the past: burpee, toes-to-bar, box jump, muscle-up, wall ball, deadlift, push-up, overhead squat.

Last week was a couplet with relatively light weights, so I'm going to assume this week will be a triplet with considerably heavier weights. As far as the time domain, it could really go either way, but let's go with... long?

Well, unfortunately I haven't really narrowed it down a whole lot. But that's never stopped me from going bold with a prediction before. Here goes:

AMRAP 15 of 10 bar-facing burpees, 10 overhead squats (115/75), 10 toes-to-bar

Zero percent chance that happens, of course. But let's see you try to do better. Post SWAGs to comments, and good luck on 14.2!