Tuesday, April 1, 2014

Can Mid-Week Projections Work?

Two weeks ago, I proposed a method to project an athlete's overall ranking before score submissions had closed for the week. To me, it made sense on paper, but it was admittedly untested. So I put out a request for help on testing it in week 4, and thanks to Andrew Havko (among others), I was able to make that happen.

So can it work? It appears that it can. That's not to say the projections are 100% accurate, and they are far from precise very early each week. But I think it's clear that the projections can give an athlete a good sense of where they would likely finish the week if they stick with their current score, which is something that is nearly impossible currently.

I tested these projections at three points during week 4: Friday 8 a.m., Saturday 5:30 p.m. and Sunday 3:30 a.m. (all EDT). The method requires one key assumption, which is the percentage of athletes who will drop off from the prior week, and for this I used 10%. Certainly this would need a bit more careful thought if it were to be implemented by HQ.

For each athlete, I projected their overall worldwide ranking at each of these times. For athletes whose score did not change by the end of the week, I compared my projection to their ultimate ranking. In total, the error of my projections were as follows:
  • Friday 8 a.m. (<1% of field reporting) - 9,575 mean absolute error*, 9,404 mean error
  • Saturday 5:30 p.m. (16% of field reporting) - 1,003 mean absolute error*, -787 mean error
  • Sunday 3:30 a.m (21% of field reporting) - 1,454 mean absolute error*, -1,362 mean error
Interestingly, the projections (at least using this first basic method) got slightly worse overall from Saturday to Sunday. The reason is that the distribution of scores submitted by Saturday 5:30 p.m. was more similar to the ultimate distribution than on Sunday. What I found was that, in general, the scores submitted very early on during the week are well above average, and the quality slowly declines throughout the week.  That is until Monday evening, when a slew of athletes replace their first score with a second improved submission. It turned out in this case that Saturday afternoon was a pretty accurate indication of how the current week's scores will turn out.

However, let's look a little more closely at the errors. Although an error of 1,003 (our best mean absolute error) is pretty small for an athlete finishing, say, 40,000th, it would be a very large error for an athlete finishing 2,000th. Thankfully, the size of the errors generally increased as the ranking increased. Below is a chart showing the percentage error for athletes across the spectrum of rankings, using our Saturday afternoon projections.


So you see that generally, we never really stray further than 3% error at any point. That's not too bad when you consider that there's currently no way to get even a good ballpark estimate until at least mid-day Monday.

Still, maybe we can do better. What if we had actually used the perfect assumption (8% in this case) for the percentage of athletes who would drop off from the prior week?  Well, in total, we improve for our Saturday and Sunday projections, with the mean absolute error going down to 338 for Saturday and 581 on Sunday. Interestingly, though, in this particular case it doesn't necessarily improve the projections across the board for Saturday and Sunday. Below is the same chart as above, but with the perfect assumption for attrition.


Although our error gets a little worse near the top, once we get near the middle of the pack, these projections are nearly spot-on. And even near the top, a 5% error isn't that bad - that's like these projections putting Josh Bridges at 100th overall, whereas he actually finishes 105th.

One way we can theoretically adjust to get even closer is to make an adjustment for the skill level of the atheletes who have submitted scores at a given point. This could involve looking at the average ranking of the athletes from their prior week's scores and comparing that to what we'd expect by week's end. The trouble is, it's challenging to know what the level will be at week's end. You might expect that the field would average out to be at the 50th percentile in prior weeks, but that wasn't actually the case here. The average athlete submitting a score for 14.4 was actually about the 48th percentile in prior weeks, which is due to the fact that the athletes dropping out after 14.3 were generally from the bottom of the pack.

My point is that while such an adjustment is possible, it might not be practical. And considering the projections even with my base 10% attrition assumption weren't too bad, I don't think further adjustments are necessary, beyond refining that attrition assumption to make it as accurate as we can.

Finally, while I think this method would produce reasonable results if implemented by HQ next year, there are some caveats about the testing done here:

  • I've only done testing for one week. There may be more (or less) error if we made these projections in week 2 or week 5.
  • I'm almost certain that the percentage error would increase a bit if we do this for each region. The sample size is much smaller, which means that even if the same principles apply, we're likely to see more variability. For one thing, it's going to take longer each week before the projections are even remotely meaningful, since many regions had less than 100 entries until late each Friday afternoon.
  • I only tested this for the men's field. I don't see any reason why the results would be much different for women, aside from the field being smaller, which would likely increase our percentage error a bit.
All that being said, I feel that implementing this method would provide a realistic glimpse into where an athlete will wind up. As long as athletes understand that this is merely an estimate, the information provided can be quite useful. 

Would this revolutionize the sport? Of course not. But I think it would be yet another improvement to the athlete experience as the largest stage of our sport continues to grow.


*Mean absolute error is the average of our errors, if we ignore the direction of the error. So if we are off by -500 for one athlete and +500 for another, the mean absolute error is 500 but the mean error is 0.

Tuesday, March 25, 2014

Fun with SWAGs: What Will 14.5 Be?

See? I told you. I knew last week was the week. For the first time in 8 tries (dating back to last year), I put up a SWAG that was pretty close to spot-on. Sure, I missed the row (who saw that coming?), but the combination of wall ball, toes-to-bar, 135/95 cleans and muscle-ups was pretty much spot-on. Will it ever happen again? Eh, probably not.

As for this week, things should be pretty straightforward with just a few movements left on the table. With that in mind, a few quick thoughts before we get to the pick:
  • I've really enjoyed the programming so far this season.  Aside from this past week, it has not been in my favor, but still, I like that they have been willing to get creative and take some chances with the programming. They've also avoided any judging disasters like they had last season with 13.2.
  • Having the video requirement for regional qualifiers seems to have increased the number of videos available for top athletes. This has been nice to see, as it's highlighted how completely legit most of the top athletes are. When I watch a video of a Games-level athlete, the reps are all crystal clear and the range of motion leaves no doubt (except in the head-to-head Open announcement WODs, which seem to have had some questionable reps, in my opinion).
  • I have to say, I'm not sure why HQ felt they needed to add a row this year. In any other setting, I would have no problem with this, but if they really want the Open to be "the most egalitarian sport in the world," requiring the use of a $1,000 piece of equipment doesn't really make sense. Sure, this only impacts 0.1% of the athletes in the Open, but really, was it necessary?
  • I'll have more on this later this week, but at first glance, it appears that my method for mid-week projections worked fairly well last week. Many thanks to Andrew Havko for pulling the data for me at several points this weekend, and thanks to the several others who offered up their services for helping me pull the data.
OK, let's get down to brass tacks. We all know what's left on the table for 14.5: burpees and thrusters. Yes, jerks are left as well, but I don't see them skipping burpees or thrusters, and I doubt they'll put jerks in a workout that already has thrusters. So I'm going to assume it's a couplet of burpees and thrusters.

Weight-wise, I think this needs to be relatively heavy, considering the LBEL and average weight load are still below their historical averages. As far as the time domain, I think they'll keep it short. We have been hovering around 10 minutes throughout the Open, but considering last week was 14 minutes and we're likely to have only two movements this week, I can't see them going long for 14.5.

Bearing all that in mind, let's get to it. My SWAG for 14.5 is:

AMRAP 7 of 3 bar-facing burpees, 3 thrusters (115/75), 6 bar-facing burpees, 6 thrusters, 9 bar-facing burpees, 9 thrusters, ...

I hate putting down something that looks so familiar, but we've seen that 7-minute, 3-6-9-... pattern in each of the past three Opens, so I feel like I had to go with it again this year. I figure the 115/75 thrusters are a bit of a diversion from prior years where it has typically been 100/65, but yeah, this one is kinda dull. Let's hope Castro proves me wrong.

Post SWAGs to comments, and good luck to everyone on 14.5!


Tuesday, March 18, 2014

Fun with SWAGs: What Will 14.4 Be?

For 14.1, I whiffed on the movements but at least got the time frame right. For 14.2, I did manage to call the overhead squats but basically missed on everything else. Last week? I pretty much got no part of 14.3.

But this week? THIS IS IT. I can feel it.

Obviously, things should be getting easier for us, since fewer movements are left on the table. Here is what we have left, in order of emphasis in past years: burpees, thrusters, jerks, toes-to-bar, cleans, muscle-ups, wall balls and push-ups. Of those, I'd expect them all to come up at some point, with the exception of push-ups. So that means we probably have a triplet coming up, or possibly even a workout with 4 or more movements, which has never occurred in the Open before.

You may have also noticed that the complexity has increased each week so far, along with the weight level. In fact, the last two weeks have been among the most unique workouts in the four years of the Open. My theory is that this trend will continue this week before HQ releases something simple and classic to close things out on 14.5. So I'm going to leave a couple classic movements on the table for 14.5: burpees and thrusters.

I also feel like they need to go long this week, since they really haven't gone beyond 10 minutes for all intents and purposes so far (on 14.2, the athletes lasting beyond 10 minutes probably were cruising for the first 3 or 6 minutes). Weight-wise, this could go either way, since the LBEL is very close to the historic average at this point. So for 14.4, I'm thinking something long and funky, with a bunch of movements.

With that all in mind, here's what I got for 14.4:

AMRAP 15 of 60 toes-to-bar, 60 wall balls, 30 clean and jerks (135/95), 30 muscle-ups

I wouldn't waste your time with your own pick. I just nailed it. Sorry, I know you wish you had thought of it first. Post congratulatory remarks to comments.

Monday, March 17, 2014

A Method to Project Overall Open Rankings Mid-Week

One quirk about the Open leaderboard is that while a workout is open for submissions, the overall rankings are basically useless. The rankings for the current week's workout are obviously understated, but as I explained in my previous post, you can at least get a decent sense of where the score will end up by looking at the percentile rank at any point in time. However, with the overall rankings, they are screwed up because the most recent week's rank is so understated in relation to the prior weeks' scores. For instance, if an athlete who was in 300th place in each of the first two weeks but posts the best score in the world early in week 3, he will still appear behind an athlete who finished 290th in each of the first two weeks but is currently 10th of 100 entries in week 3. But we know that by week's end, there will be much more separation between the athletes in their week 3 ranks, which will place the first athlete well in front.

So is there a way we can get at an accurate projection of an athlete's overall ranking mid-week? I think we can, but not without a little bit of work.

The idea is this: since we can reasonably project the ending percentile ranking for the current week's workout, we should be able to reasonably project the ending rank, if we make an assumption about how many athletes will complete the workout. If we can get that projection for any particular athlete, we should be able to do that for all athletes who have completed the workout. At that point, we can re-rank those athletes based on projected total points. Using that, we can basically "scale up" those ranking based on how many athletes we anticipate will complete the current week's workout.

More specifically, here is the process I am proposing:
  • Compute each athlete's percentile ranking for the current week (either overall or in the region) based on the athletes who have currently submitted scores
  • Based on the number of athletes in contention at the end of the prior week, reduce that by some factor (say 10%, which is near the historical average) to get an estimate of the number of athletes who will remain at the end of the current week
  • Multiply the athlete's percentile ranking by the estimated number of athletes who will remain at the end of the current week to get the projected rank for the current workout
  • Use these projected ranks to get a projected overall point total at the end of the current week
  • Re-rank the athletes who have submitted scores based on the projected point totals
  • Convert the projected rankings to a percentile rank based on the number of athletes who have currently submitted
  • Multiply this percentile by our earlier estimate about how many athletes will remain at the end of the current week. This will give you each athlete's projected overall rank at the end of the week.
To accomplish this, all we would need a snapshot of the full leaderboard at a given point in time. I do not think it is possible to accomplish this even for a single athlete without making the calculations for all athletes. However, with the right computing power, it would be a relatively painless calculation to generate the projected overall rankings. Obviously HQ would be in the best position to perform these calculations, but I think it is conceivable that someone on the outside could do this as well.

This is all theoretical at the moment - a decent amount of testing would be necessary to make sure this process actually produces reasonable projections. Still, I think the concept is something that could be used to improve the Open experience for all of us.

Note: If anyone out there has the resources and the know-how to get a hold of the leaderboard mid-week and get it into Excel or .csv, I'd be very interested to test this out. If so, post to comments or email me directly (anders@alumni.wfu.edu).

Tuesday, March 11, 2014

Fun with SWAGs: What Will 14.3 Be? (And More)

Before we get SWAG-y, I'd like to get some quick thoughts out about 14.2 and also give you the results of an interesting little study I did last week.

Without further ado, a few thoughts on 14.2:
  • Really cool to see a new WOD format for the Open.  We all know the Open has typically been filled with very basic, classic CrossFit WODs, so it was refreshing to see a workout like this come out. I think it took us all a couple of days to figure out the right strategy for this one.
  • Personally, I would have liked to see the overhead squats either progress in weight or start a little bit higher. For the top-tier athletes, this was pretty much all about the chest-to-bar pull-ups. When Castro announced that the rounds would change in each 3:00 segment, I was really hoping to hear him say the weights would increase. But still... cool workout.
  • Good to see that they set the breakpoints such that more athletes were able to get past the first round, unlike 13.5. Based on a really quick look, it looks like about 70% of the men reached the second round and about 30% of the women reached the second round.
  • Looks like we saw quite a few people drop out in week 2. About 15% of the men and 15% of the women dropped out. That was a bigger percentage drop than we saw in any week last year, except for the women between 13.3 and 13.4 (19%). [UPDATE 4/12/2014 - This previously said the largest drop was between women's 13.2 and 13.3. That was a typo.] 
Also last week, I decided to try something new. I picked out two athletes who submitted scores by 9 a.m. on Friday morning, and each 12 hours after that, I tracked their progress on the leaderboard. My goal was to understand how the ranking of a particular score will evolve over the course of the weekend. Obviously an athlete's position would continue to get worse, but would his or her rank decrease (or increase) as a percentage of the field? If not, then an athlete could quickly get a sense of where his or her score might end up simply by checking out the percentile rank early on.

So let's see what happened. These athletes each were ranked in the 90th percentile at the start of the weekend, and I tracked the rank of the score, not necessarily the athlete. If the athlete retried the workout, their new score did not count towards this study.

Looking worldwide, we can see that in fact, the athlete's percentile stayed relatively stable. The men's percentile improved slightly, which I think is actually more typical. The women's score I picked was 87 (one rep away from finishing the 12's), and I suspect that a lot of women worked just hard enough to get that past hump on their second try.

Quickly, let's take a look at the progression in the region, which is generally more important for most of us.


Here we see that the athletes actually started at a lower (worse) percentile in the region. In the North American regions, this is probably common, since the only people performing the workouts that early on are generally strong athletes. In the rest of the world, those scores early on could come in normal Friday classes. But by 9 p.m. Friday, the athlete's percentile was pretty indicative of how things would finish up.

Anyway... on to the SWAG. Briefly, here is what we know:
  • The movements left on the table (in order of how much emphasis has been placed on them in the past three years): burpee, thruster, jerk, toes-to-bar, box jump, muscle-up, wall ball, clean, deadlift, push-up.
  • The LBEL so far is 0.38 for men and 0.27 for women. That's fairly close to the historical average for women (0.30), but below the historical average for men (0.46). So I think it's likely that things will get a bit heavier over the final three workouts than they have been to this point. That may or may not be true for 14.3 specifically, though.
  • Neither workout so far has been particularly long. 14.1 was 10:00, and 14.2 had at most 9:00 of challenging work for any particular athlete, when you consider that the early rounds were nothing more than a warm-up for the top athletes.
  • The first two workouts have been couplets. In 2011, there were three triplets, and in 2012 and 2013 there were two triplets each, so I think it's fair to assume we're getting a triplet soon. There were no single-modality workouts in 2013, and I doubt we'll see one this year.
With that as our framework, here we go with the SWAG for 14.3:

AMRAP 16 of 8 thrusters (115/75), 10 toes-to-bar, 12 bar-facing burpees

Again, you can pretty much take it to the bank that I will be way off. But it's not like you can do better. Of course, if you think you can, post SWAGs to comments. Enjoy 14.3 everyone!

Tuesday, March 4, 2014

Fun with SWAGs: What Will 14.2 Be?

One week in the books. I don't know about you, but I am once again shocked by the jump in competition from last year. I know personally I feel fitter than ever, but I find myself lower on the leaderboard than in past years (~780 in the Central East). It's tough seeing the top of the leaderboard get further and further away, but it's also reassuring that the sport is headed in the right direction. With 6,500+ men in the competition here in my region, only the absolute best of the best will even make it to Regionals. Marcus Hendren, top 10 in the Games the past two years, is currently outside the top 100 in his own region. This is where we are these days. For the rest of us, there's nothing to do but work hard to keep up.

Anyhow, let's move on. I may not qualify for Regionals, but I still have a chance to fulfill another lifelong dream: correctly predicting an Open workout the week beforehand. Last week, with basically nothing to go on, I wasn't particularly close, except for the 10-minute time domain. But I have a feeling is my week. NOW IS THE TIME.

OK, so let's get started. Let's assume that snatches and double-unders are off the table (even though half of the field probably used clean-and-jerks instead of snatches). Let's go ahead and assume they'll hold thrusters and pull-ups until the end again (although I really hope they don't). I also don't think they'll put cleans or jerks in this week, since they were an option in 14.1. That leaves us with the following movements on the table, in order of how much emphasis was put on them in the past: burpee, toes-to-bar, box jump, muscle-up, wall ball, deadlift, push-up, overhead squat.

Last week was a couplet with relatively light weights, so I'm going to assume this week will be a triplet with considerably heavier weights. As far as the time domain, it could really go either way, but let's go with... long?

Well, unfortunately I haven't really narrowed it down a whole lot. But that's never stopped me from going bold with a prediction before. Here goes:

AMRAP 15 of 10 bar-facing burpees, 10 overhead squats (115/75), 10 toes-to-bar

Zero percent chance that happens, of course. But let's see you try to do better. Post SWAGs to comments, and good luck on 14.2!


Monday, March 3, 2014

Quick Hits: Open 14.1 Initial Thoughts and Analysis

So, all of us misfired on our SWAGs this week, although give John Nail credit for almost calling a repeat of 11.1 before overthinking it and going in a different direction. KISS principle, John.

Anyway, with 14.1 being a repeat workout, many of us knew about what to expect from this one. I personally had done it 4 times before, so I knew what I was getting into. But as is often the case with CrossFit, you always find yourself learning something new. One thing I found was that I actually benefited from mixing in clean-and-jerks with the snatches. In the past, I think my pride had gotten the best of me and kept me from using the C&J, but I actually found them very helpful. Around the midpoint of the workout, I started using a clean-and-strict press for about 2/3 of my reps. I found the strict press to be nice because it took the stress totally off my legs for a brief moment. I ended up with a PR by 30 double-unders + 2 snatches (score of 302), but I think I'm going to give it one more shot tomorrow and mix in even more C&J's early on.

That being said, I personally would have preferred that the workout required the snatch. I think the option to use C&J's is basically a loophole that was exploited a lot more than perhaps HQ anticipated, much like the step-ups last year in 13.2. For many people, they won't even need to perform a single snatch during the Open, since it's unlikely (but possible) snatches will be programmed again.

I also spent a bit of time this afternoon reviewing video submissions to do a quick version of the leveraging analysis that I did for several Open workouts last year (as well as Jackie from Regionals). The concept here is to look at the average pace for each movement and the variability of the pace for each movement, which will then allow me to understand which movement is more leveraged. By leveraged, what I mean is a measure of how much an athlete's score will suffer if he/she struggles with a particular movement. See my post "How is Jackie Being Won?" from last May for more details.

Today's analysis is based on reviewing 10 male athletes who scored between 303 and 356. These are the type of athletes who will be fighting for those final regional spots. The analysis would certainly look different if we looked at a more average athlete or at a Games-level athlete. In particular, the average athlete may struggle more with double-unders, but this analysis assumes that the athlete is at least competent with double-unders.

Note that for the sake of time, I timed a couple of rounds for each athlete toward the middle of the workout. Their overall pace was probably slightly faster, but the overall message here is still valid, in my opinion.


Those final leverage figures mean the following:

  • An athlete who is 1 standard deviation worse on the double-unders but 1 standard deviation better on the snatches actually finishes 1.9% better than an athlete who is average at both movements.
  • An athlete who is 1 standard deviation worse on the snatches but 1 standard deviation better on the double-unders finishes 7.8% worse than an athlete who is average at both movements.
Basically, an athlete that struggles a bit with the double-unders can make up time on the snatches, whereas an athlete who is slower on the snatches will have a hard time making up enough ground on the double-unders. Interestingly, the coefficient of variation was almost identical for each movement (17% for double-under and 16% for snatches). However, the snatches took nearly twice as long on average, which is why we see that they were so much more valuable.

If there's anything to take away from this, it's that pushing the pace on the double-unders at the expense of snatches does you no good. Gather yourself during the double-unders, and be ready to roll on the snatches. (Easier said than done, of course.)

Well, that's it for today. Good luck for all those hitting this workout on Monday, and I'll see you back here Tuesday night for our SWAG for 14.2.