Thursday, July 24, 2014

Pick 'Em Rankings (updated after each day of competitions)

The contest rankings below represent what the payoffs would be using the current Games standings.  A highlighted cell indicates a correct pick.  Currently the chart below does not show how much was wagered on each athlete.  I'm working on a way to incorporate that without making the chart too cumbersome.


Congratulations to JesseM on winning the CFG Analysis 2014 Games Pick' Em! Five of six picks right, including a big-money pick of Lauren Fisher for top 10. More analysis of the results, and the Games in general, will be coming over the next week. Stay tuned!

If you see a pick that looks incorrect for you, let me know. I typed these in by hand and easily could have made a mistake.

Quick Thoughts on the Games so far [UPDATED MORNING OF 7/27]:

  • Not quite sure how Rich Froning is back in front after the past three days. He's had so many finishes in the 20's that it's pretty shocking he could still be in first, but his ability to rack up first and second place finishes has been crucial with this scoring system.  I haven't had time to do the math on this yet, but my guess is that Mat Fraser would be in front if the regionals scoring system was used.
  • Camille Leblanc-Bazinet looks to be running away with this on the women's side.  I mentioned it to a friend of mine prior to the Games that the fact that she has had so little hype, especially compared to Julie Foucher, might really help her mentally at the Games.  I'd like to note that Camille was a 38% shot to win the Games here on CFG Analysis, but was not in Pat Sherwood's top 8 competitors.  Just a note.
  • On the flip side, the pressure on Julie Foucher may have been too much.  For whatever reason, she was the focus of so much hype in the community this year, and the Games really haven't turned out quite like people expected for her.  That being said, she still has a shot at the podium if she can put together a couple solid finishes today.
  • Spectator-wise, Saturday's events were definitely the best of the three days so far.  The muscle-up biathlon made for some tremendous drama, and the men's final in the push-pull event had quite possibly the most intense finish that we've seen in the Games history.  It certainly helped that the men's competition (finally) is actually close, and that race really meant something. [END 7/27 UPDATES]
  • [7/26 UPDATES BELOW]
  • It probably goes without saying that this will be the toughest test Rich Froning has faced since he finished second in 2010. He desparately needed that win in 21-15-9, but now that he established that he can still dominate in a traditional CrossFit workout, I would still consider him the favorite despite sitting in fourth right now.
  • Of the men in front of him, I think Josh Bridges has the best shot to take him down.  While I don't think Khalipa will fall far, I also think that he generally has a hard time beating Froning in the traditional workouts, so I expect Froning will continue to gain on him the rest of the weekend.  But Bridges is one of just a handful of people that can beat Froning on the classic CrossFit workouts if they're in his wheelhouse.
  • If Julie Foucher isn't able to make some ground up quickly today, this is set up to be a two-horse race on the women's side.  Camille has survived the early workouts, which typically have hurt her in the past, and the remainder of the weekend should set up well for her.
  • Interesting that none of our 25 participants picked current leaders Kara Webb or Jason Khalipa to win.  Khalipa is popular enough that I am surprised no one took a flier on him at something like 70:1.  It's a little more understandable on the women's side, since Kara Webb wasn't quite as good of a payoff at 18:1 and Julie Foucher seemed like a great value at 6:1.[END 7/26/UPDATES]

Wednesday, July 16, 2014

Quick Hits: Contest Updates and Last-Minute Games Thoughts

T-minus one week, everyone.

Not that it should be a shock to anyone who's followed the sport for the past few years, but the Games will be kicking off two days earlier than originally scheduled, beginning with "The Beach" on Wednesday, July 23. We know a little bit about what's to come at this year's Games, but as usual, most of the weekend is still unknown. As we head down the home stretch here, I wanted to get one more post in to hit on a few topics.

First, the CFG Analysis Games Pick 'Em is now almost closed, and we've got 25 people entered as of the time of this writing. I sort of consider this a test run since there's nothing to give away and I haven't really done a whole lot of advertising, but I think we've already got enough entries to at least make things interesting and iron out any kinks for next year and beyond (that being said, anyone on the fence about entering, go ahead and throw your hat in the ring - the more, the merrier). What I think is intriguing about the set-up for this contest is that there are a lot of different strategies to employ, and we've already seen a few different ones come up.

One way to evaluate that is to look at how aggressive people have been with their wagers. For each competitor so far, I've calculated the maximum payoff they could have in the event that they hit on all six picks. These range from less than 60 points to over 600 at this point. The top 4 potential payoffs so far are:
  1. jayfo - 628.5 points (including 6 on Eric Carmody to go top 10 and 3 on Kenneth Leverich to podium)
  2. J Weezy - 345.1 points (including going with Noah Ohlsen for win, podium and top 10 on the men's side, as well as 10 points on Alessandra Pichelli to go top 10)
  3. kie - 325.0 points (including 15 points on Julie Foucher to win and a high-paying pick of Lucas Parker to podium)
  4. John Nail - 298.2 points (including a clever 13-point wager on Alissandra Pichelli to go top 10)
It's also been fascinating to see which athletes are being picked most often. Remember, these are not necessarily the athletes people think are most likely to make finish in certain spots, but rather the athletes where people believe my predictions are most understated. And since my picks are largely based on Regional results, what that really boils down to is the athletes who people expect to improve on their Regional performances.

The athletes with the most points wagered on them so far are:
  1. Julie Foucher - 92 points
  2. Rich Froning - 74 points
  3. Camille Leblanc-Bazinet - 32 points

Clearly people are liking Julie Foucher to win with roughly a 6:1 payoff, but even more impressive is that people are almost exclusively picking Rich Froning to win despite a payoff of less than 1:1. Then again, it's hard to bet against a man who has won the Games three years in a row and has finished atop the Cross-Regional leaderboard three years in a row. 

As noted before, the payoffs will be adjusted if and when athletes withdraw prior to competition. So far, Rory Zambard has withdrawn. I have adjusted the payoffs already, but the effect is minimal, so I won't be re-posting the predictions. If one of the favorites drops off, I will likely re-post the picks. You are allowed to adjust your picks at any time before Wednesday's opening event for any reason.

Now, onto the Games themselves.  At this point, we know at least something about 5 events, but three have at least some portion left in doubt. For instance, all we know about "The Beach" is that, well, it's at the beach. And we do know there will be two sled push events, but we don't know the weight. The weight there could make a huge difference; I like Jason Khalipa's chances a lot better with a 300-lb. sled than a 100-lb. one.

With that in mind, here are a handful of thoughts on what has been released and what could be to come:

  • The events released so far are a pretty decent balance between strength and conditioning, so I don't think the remaining events will really be biased either way. What we haven't seen yet is any sort of high skill gymnastic movements, so I'd expect those to be coming Friday and Saturday night as well as Sunday.
  • For those who read this blog on a regular basis, you may recall that the "Triple 3" combines the three movements that fall into the "Pure Conditioning" group. Without question, this is going to be a test of exactly that: conditioning. Especially at this level, I don't see these athletes having too much trouble with 300 double-unders. The row is basically just a long warm-up here, and I don't think the athletes will separate much there. I think this is all going to come down to who has the lungs left on that 3 mile run.
  • Given the other events on Friday, I'm not sure that "The Beach" will actually include much running. We already have a 3-mile run and the two sled pushes on Friday, so they may not be doing a whole lot more running (plus you throw in the 300 double-unders, and there are going to be some sore calves come Saturday morning). My hunch (hope?) is that "The Beach" includes some unique stuff like paddle boarding or some sort of object carry through the water.
  • Interesting that they put the one-rep max overhead squat on Wednesday when hardly anyone will be watching. Typically the max lift events have been highlighted at the Games in the past, so I'm not sure why they buried this one at the beginning. It also seems a little repetitive considering the amount of overhead squat and heavy snatches at the Regionals. 
  • I wouldn't rule out another max lift somewhere in the weekend. What I would LOVE to see is a heavy ladder, but with several reps per minute. I was in a competition a couple years ago that had 20 double-unders plus 3 clean-and-jerks each minute, and that got serious in a hurry.
  • Will the two sled events both be worth 100 points? I hope not. I like the idea of having these small events that are worth 50 points to diversify the competition without weighting one particular skill too much.
That's it for today. Unless there are any major updates needed to the contest page, I expect you won't hear from me again until the Games begin. The goal is to post daily updates on the contest standings, and maybe a few quick thoughts on each day's happenings.

Until then, good luck with your training, and get your mind right for the Games!

Thursday, July 10, 2014

Does Past Games Experience Matter? (And Other Thoughts On Predicting the Games)

Today, I'd like to tackle a topic that's been mentioned quite often in CrossFit Games commentary. It's basically an assumption that's been taken as fact: having experience competing at the CrossFit Games in the past gives an athlete an advantage over first-time Games competitors. I've generally believed this to be true, but without data to support it, we're all really just guessing.

But let's start with the reason I decided to look into the issue. About a week ago, I released my 2014 Games predictions, which were used in the CFG Analysis Pick 'Em that is going on right now. What I started to notice as picks came in is that people tended to wager much more often on past Games competitors. One reason for this is simply familiarity: people know these athletes and have seen them perform well (or perhaps they just like cheering for them). But I believe another reason is that people tend to assume that Games experience matters. And the reason that is a factor here is that my model does not take past Games experience into account.

Why? Well, in constructing these models, I wanted to be able to predict the chances that an athlete would win (or finish top 3 or top 10), not simply make a prediction about where they would finish on average. That meant I couldn't just set up some sort of linear regression model that could account for several variables (such as Regional placing, Open placing, past Games placing, etc.). I needed a model that could generate a range of outcomes, and I felt using this year's Games results was my best bet. This is a different approach than I took for the regionals predictions, for three reasons:
  1. Entering the Regionals, there had only been 5 events thus far this season, which is not really enough for me to get a sense of what types of events might come at the Regional level.
  2. Some athletes notoriously coast through the Open, so those results alone would not be a great predictor of Regionals success.
  3. Because so many more athletes compete at Regionals each year compared to the Games (~1400 vs. 85), there was enough historical data for me to build a decision-tree-style model for Regionals. There is simply not enough past Games data to learn about what characteristics give an athlete a chance of winning. Basically what we would learn is: "In order to win, be Rich Froning."
So my solution was to build a pretty unique simulation model that took into account specific results for each athlete for all 11 events this season prior to the Games. It's at this point that I'd like to recite a quotation that one of my work colleagues (a predictive modeling guru if there was one) likes to bring up quite often:

"Essentially, all models are wrong, but some are useful." - George Box

Any model we come up with to predict the CrossFit Games will be wrong. Remember, a perfect model would predict with 100% certainty exactly who would finish in what positions. No one would have a 20% chance of winning - they would have a 100% chance or a 0% chance. But creating such a model is impossible. So with that in mind, I acknowledge that my model is wrong. But is it useful? I think so.

The chart below shows the calibration of this model on the 2012 and 2013 Games (combined men and women for both years). This shows how often athletes finished in the top 10, compared with the chances I gave them. A perfectly calibrated model (not necessarily a perfectly accurate one) would have the blue line follow the red line exactly, so that for the athletes I predicted with a 7% chance to finish top 10, exactly 7% of them did finish in the top 10.

As we can see, the model has been pretty well calibrated the past two years. Generally speaking, athletes with a low probability of finishing in the top 10 don't finish in the top 10. The model is also much more accurate than a dull (but perfectly calibrated) model that gives all athletes an equal chance of finishing top 10: my model's mean square error was 11.6% vs. 17.1% for the equal chance model.

But of course, my model is not perfect. And one area where it could be skewed is in how it accounts for (or rather, does not account for) past experience. If past experience is an advantage, then my model is understating (to some degree) the chances for returning Games athletes and overstating (to some degree) the chances for first-timers.

Which brings us back to the original question: Does past Games experience matter? To answer the question, I compiled the results from the Games and Regionals from 2011-2013 and tagged all athletes with Games experience prior to the year of competition (I went all the way back to 2007 to see if athletes had past experience). 

The simplistic way of looking at this is to compare the finishes of athletes with prior experience compared with first-timers. Looking at things this way, we find that returning athletes do finish approximately 8 spots higher than new athletes on average (18.3 vs. 27.8). However, this could simply be due to the fact that the returning athletes are just flat-out better, and their experience had nothing to do with their Games performance.

What we should do to account for this is compare Games performances to Regionals performances in the same year (using the cross-Regional rankings, adjusted for week of competition). In general, we expect athletes who fare better at the Regionals to perform better at the Games. So if Games experience is a factor, the returning competitors should perform better at the Games than their Regionals results would indicate. When we look at things this way, we see that returning competitors do indeed improve their placing by approximately 0.6 spots from Regionals, while new competitors dropped by approximately 0.8 spots in from Regionals.

Unfortunately, there is a still a problem with this comparison. Although Regionals performances are a good indicator of Games performance, there is still a tendency of athletes to regress towards the mean  in general. That is, athletes who finish near the top at Regionals don't tend to improve their placement at the Games, while athletes near the bottom at Regionals tend to improve slightly on average. Part of this is due to the fact that if you finish near the top at Regionals, there is basically nowhere to go but down (and the reverse is true for the athletes at the bottom of the Regional standings).

So to be fair, we need to compare returning athletes with first-timers who had similar Regional placements. Since we don't have a huge sample, I split the rankings into buckets of 10. Within each bucket, I found the average Regionals and Games placements of returning athletes and first-timers, as well as the average improvement or decline. The results are presented in the chart below.

For every level of competitors except those near the bottom, the athletes who had past Games experience showed an advantage at the Games over first-timers with similar Regional placements. While we can see that there is significant variation in how much this advantage is worth, if I had to put a number on it, I'd say that Games experience is worth between 4-5 spots at the Games. Remember, my current predictions assume all athletes have equal experience, so a reasonable adjustment might be to improve the average rank of past competitors by ~2 spots and drop the rank of new competitors by ~2 spots.

This analysis is, of course, not precise. It is likely that experience matters more for veterans like Rich Froning and Jason Khalipa than it does for someone who has only competed once at the Games before. Moreover, some veteran Games athletes have consistently struggled to match their Regionals performances, while newcomers have overachieved in the past (see Garrett Fisher last year).

I don't plan to adjust my predictions this year, for a few reasons:
  • There is not a simple solution of how to implement this factor into the model framework I have set up;
  • I feel that the predictions are still pretty reasonable on the whole (based on the calibration seen in the past two years);
  • For the Pick 'Em contest, I committed not to change those predictions for reasons of fairness. I suppose I could produce a second set of predictions, but I think that's just creating unnecessary confusion. Anyone entering the contest is welcome to use the information in this post to their advantage if they wish.
Still, when it comes time to make my predictions again next year, I'm going to try to find a way to account for past Games experience. The model still won't be perfect, but hopefully it will be even more useful.

Tuesday, July 1, 2014

So Who CAN Win the 2014 CrossFit Games?

In just three short weeks, the rubber meets the road in Carson, California for the 2014 CrossFit Games. Until then, there is still plenty of time for speculation, and today marks the official speculation from CFG Analysis. Like last year, I've estimated the chances of each male and female athlete winning, placing top 3 and placing top 10.

For those who are interested, these predictions are used as the basis for the CFG Analysis Games Pick 'Em. For more information on that contest, see the original post. Please post your contest entries to comments on that post, NOT this one. Feel free to comment on this post, just don't put your contest entries there.

I won't bore you with too many details before we get to the picks, but here are some key points about the methodology:

  • For the most part, the concept is the same as was done last year. For a full run-down, see my methodology post from last year. The basic idea is that we're using the regional results and the Open results from this season in a variety of combinations to simulate what might happen at the Games.
  • In general terms, the picks are based 80% on the Regionals, 15% on the Open and 5% on last year's Games.
  • The only information used from last year's Games were the athletes' results from the half-marathon row and the Burden Run. The reason for using these two is that there are no regional events of this length, so hopefully these events will give some insight into how the athletes will fare if (when) an event of that length comes up. For athletes that didn't compete last year, they get a random result on that event in each simulation.
  • The regional results have been adjusted to reflect the advantage that athletes in later weeks have over athletes in the earlier weeks (whether it be from additional training or better strategy). I've done this the past two years, but this year the impact of each additional week was stronger than in past years. The one region that I made an extra adjustment for was the North East, since that event was held outside. I treated it as if it had been in week 1 rather than week 4.
  • No advantage is given to returning Games athletes, even for Rich Froning. Certainly some will disagree with that, but we've seen time and again that athletes come out of nowhere (Garrett Fisher 2013, for instance) and past podium athletes can fall off (Matt Chan 2013, for instance). I'll probably devote a post in the next two weeks to looking at how much (if any) Games experience plays a role in predicting an athlete's success, beyond their performances so far this season.
  • There are some athletes who are listed with a 0.0% chance of finishing in certain spots. Obviously every athlete has at least some chance of winning, but this method simply isn't going to account for such true long-shots. Last year, the longest shot on either side to finish in the top 10 based on my picks was Anna Tunnicliffe at 3%.
  • As always, these picks aren't a personal judgment about any athlete, and of course, they are just for fun. Much respect for any athlete that even makes it to this level to begin with.
With those items out of the way, let's get on with the picks. These picks are subject to change in the event that athletes drop out prior to the competition (or if they add a late wildcard, for some reason), but otherwise, consider them final. I will make notes in this post of any changes that have occurred.

[UPDATE 7/8: HQ just announced they will be paying out prize money for the top 20 finishers, but anywhere you see the term "money" in this post, it refers to top 10.  I have re-posted these charts with the heading changed to say "Top 10", but I will not be re-doing the predictions to give odds for finishing top 20.] 

Monday, June 23, 2014

Introducing the CFG Analysis 2014 Games Pick 'Em - NOW OPEN

I've toyed with the idea of putting together some sort of CrossFit Games pool or contest for the past couple of years, but a couple of things held me back.  One, I wasn't sure if there was enough interest and enough readership here to support it; and two, I really didn't know how a CrossFit Games pool should work.

Given the pretty decent volume of responses we got for the SWAG's during this years' Open, I think (hope?) there is enough interest to do something more formal for the Games this year. So we're going to try and see what happens.

But how will it work? Well, since the CrossFit Games are unlike most sports where you see pools or pick 'em contests (NCAA basketball, college football bowls, World Cup), the set-up of this contest will be decidedly different as well. It will also give me a chance to put some of the analysis I've been doing here to use. So here's the set-up:
  • Around the beginning of July, I will release my predictions for this year's CrossFit Games. These predictions, like last year, will contain the expected chance of each individual athlete winning, placing top 3 and placing top 10. These predictions are based almost entirely on the results from this year's Open and Regionals.
  • Using these predictions, you enter the contest by placing "wagers" on three male athletes and three female athletes. Each person entering the contest will get 20 points with which to make these wagers. Your result in the contest is based on how much your wagers end up paying off.
  • The payoff for each wager is based on my predictions. For instance, last year I gave Rich Froning a 59% chance of winning. If you wagered 5 points on Froning to win, you would receive 5/0.59 = 8.47 points for that pick. If you wagered 5 points on Jason Khalipa to win, you would receive 0 points since he did not win.
  • You must make exactly 6 wagers:
    • One male, one female to win
    • One male, one female to place top 3
    • One male, one female to place top 10
  • You may spread your 20 points among these 6 wagers in any way you choose. The wagers must be in whole point increments and you must wager at least 1 point on each pick.
As an example, let's pretend I was entering this contest last year. I decide to wager as follows:
  • 5 points Rich Froning to win (59%)
  • 3 points Jason Khalipa to place top 3 (34%)
  • 3 points Lucas Parker to place top 10 (27%)
  • 4 points Sam Briggs to win (32%)
  • 4 points Camille Leblanc-Bazinet to place top 3 (26%)
  • 1 points Alessandra Pichelli place top 10 (24%)
I would have made 8.47 on Froning, 8.82 on Khalipa, 0 on Parker, 12.50 on Briggs, 0 on Leblanc-Bazine and 4.17 on Pichelli. In total, I'd have 33.96 points (this would be 13.96 in profit). My guess is that would have been a pretty good effort.

There is no entry fee. The prize? Glory. Lots of glory. If this works out well this year, maybe I'll figure out a way to actually give something tangible away next year.

So be on the lookout for my predictions coming in the next couple weeks. After the predictions come out, post your wagers to the comments on THIS POST. I'll try to provide updates after each day of the Games regarding the current standings. Have fun everyone!

[UPDATE 7/1: Predictions are below and the contest is now open. These predictions are final, with the following exceptions:

  • If an athlete withdraws prior to the first event, all other athletes' chances will be adjusted upward to relect the loss of that athlete. I'll repost the predictions as soon as I can.
  • If an extra athlete gets a wild card before the Games, I will be forced to re-run all the predictions. I won't change the methodology, so you can expect the chances for most athletes to remain similar.
In the event that I do have to change the predictions for the reasons above, all wagers will be against the revised picks. You can change your own picks anytime up until the start of the first event, for any reason.

These predictions are rounded to the nearest 0.1%, but I will use the true value when calculating the payoffs. Any athletes that truly have odds of 0.0% will be given a payoff of 5,000-to-1.

OK, with that, below are the picks. For methodology details, please see this post.]

[UPDATE 7/8: HQ just announced they will be paying out prize money for the top 20 finishers, but anywhere you see the term "money" in this post, it refers to top 10.  I have re-posted these charts with the heading changed to say "Top 10", but I will not be re-doing the predictions to give odds for finishing top 20.]

Tuesday, June 10, 2014

2014 Regional Review

Every year, watching the Regional competition, seeing how incredibly difficult it is to qualify for the Games even for the most seasoned of athletes, it's hard to imagine that those who do qualify have any weaknesses. Yet in a few short weeks, we will once again see the programming elevate to yet another level, exposing any deficiencies and allowing the best of the best to shine. Even with a few big name athletes out of the mix, it's going to be another tremendous competition, possibly the best to date (the women's side in particular is WIDE open).

But before we move fully into Games mode, let's take a more thorough look back at this season's Regional competition.

First, we'll start with the programming. Although I thought the programming was solid this year, I still preferred last season's events a bit more. A big reason for that is the handstand walk event - I simply cannot get over the fact that a single attempt at walking on one's hands was worth the same amount of points as classic CrossFit metcon like event 3, event 4 or event 7. It just seems to me that if you're evaluating an athlete, those other events tell you a lot more than a handstand walk. Part of that is opinion, but it is also backed up by the numbers. Below is a chart showing the correlation between an atahlete's ranking in each Regional and Open event this season and the athlete's ranking across all other events. Higher numbers indicate that the best overall athletes are typically finishing high on the event, which generally means the event was a good test of fitness.

As I noted last week, the events that really stood out on the positive side were Regional Even 3, Regional Event 7 and Open Event 4. All three had correlations near 80% for both men and women. I've admitted already that my doubts about Regional Event 7 were unfounded, but I'll state it again: this was a pretty great event. On the flip side, you can see that the handstand walk (Regional Event 2) had one of the lowest correlations for both men and women, although the run/rope climb workout (Regional Event 5) had the lowest correlation of any event. My theory here is that since most other events generally favored smaller athletes, many top athletes on the other workouts struggled a bit here since the height was a clear advantage.

Which brings me to a second topic on the programming: did the Regional programming really favor smaller athletes, as many (including me) expected it would? This came up on the Update Show last week, and predictably it was dismissed as being a myth. The evidence? Well, since Tommy Hackenbruck, Jason Khalipa and Elizabeth Akinwale all qualified, the programming must not have favored smaller aithletes, according to Pat Sherwood. Needless to say, as a stats guy (and just a fan of the sport in general), hearing this topic brushed off so casually was irritating for a number of reasons:

  • Anyone who knows anything about data knows that you can't prove a point like this simply by cherry picking a few oddball cases. Can we assume height is not advantageous in basketball because Nate Robinson is in the NBA? Of course not.
  • Tommy Hackenbruck is not even that tall. The man is 6'1", so let's keep things in perspective here. And Elizabeth Akinwale is only 5'7"!
  • As much as HQ talks about data points and all the insights they can gain from the numbers, we so often get responses like this instead of actual analysis. HQ has the numbers at their disposal to provide better evidence to support their point, yet they choose not to. I love this sport, but we lose credibility when the organization in charge so rarely admits any fault.
None of this is to say that Sherwood was necessarily wrong. His opinions were simply not supported by facts. That's what is irritating.

However, the numbers are available to see if the HQ company line is in fact correct. I spent a couple of hours compiling the heights and weights of male Games qualifiers from the past three years (I'd like to get the female numbers at some point when I have some more time). Here are the raw averages for the field in those years:
  • 2012 - 69.9 inches, 198 pounds
  • 2013 - 69.9 inches, 197 pounds
  • 2014 - 69.4 inches, 192 pounds
Clearly there is a drop in size in the past year, and if we throw out the bottom and top two values from each year (including 6-foot-5 Aja Barto in 2012 and 2013), the trend actually becomes a bit more apparent.

  • 2012 - 69.9 inches, 199 pounds
  • 2013 - 69.7 inches, 196 pounds
  • 2014 - 69.4 inches, 192 pounds
Now, is this a drastic drop? No. But the fact that the average size of the Games athletes is a half-inch and 5 pounds lighter than a year ago is not meaningless. Perhaps you could argue that the programming in prior years was too favorable towards larger athletes (I wouldn't argue that, but you could). But to just ignore the numbers altogether is a bit disingenous.

Something else that came out of this investigation (which I hope to expand on at a later date) was that I decided to look at how the distribution of heights for Games qualifiers compared to the general population. My theory was actually that since the programming is generally so balanced, we'd see that the Games athletes have a distribution of heights that are pretty typical. Below are the comparisons (Games athletes distribution based on the qualifiers from the past three years combined, general public data from the Census bureau).

Although the shape of the distribution isn't terribly different, we do see that Games athletes are more concentrated around the average (approximately 5-10) than the general public. In fact, this has been a trend in recent years. The standard deviation in height among Games qualifiers has dropped from 2.4 inches in 2012 to 2.1 inches in 2014, and the standard deviation in weight has dropped from 15.9 pounds in 2012 to 13.2 pounds in 2014. I expect this will continue in the future as the field grows and it becomes harder and harder to overcome any natural disadvantages.

This brings me to my final topic for today: who were the most impressive athletes at this year's Regionals? Obviously, Rich Froning remains the dominant athlete on the men's side, staying well clear of the field despite (reportedly) battling a nasty cold all weekend. And for sure, Scott Panchik again showed that he's a podium contender by hanging right with Froning (and also finishing well ahead of the third-place athlete in the Cross-Regional Comparison). On the women's side, clearly Camille showed up big-time, as she typically has in past Regionals.

But considering what we saw above regarding the concentration of "average size" athletes, it does make performances by athletes like Hackenbruck, Nate Schrader (6'1") and Becca Voigt (5'9") even more impressive. In fact, I'd venture to say these athletes will do even better at the Games, since the programming could be more favorable to their size. Hackenbruck and Schrader were both in the low 20s in the Cross-Regional Comparison (the best of any athletes over 5-11), and I could easily see both in the top 10 at the Games.

That's it for today. In the next few weeks, we'll get into adjusting those Cross-Regional Comparisons, and of course, assessing who actually could win the CrossFit Games. Until then, enjoy your training, and I'll see you back here soon.

Thursday, May 29, 2014

Regional Predictions, Week 4

Although last week's regional competitions had their share of drama, I'll admit it felt like a bit of a letdown after the amazing weekend prior. Thankfully, this fourth and final weekend looks like it has some fantastic stuff in store. I mean, how could Northern California's men's competition not be insane? We have seven former Games competitors vying for three spots, including three men who finished in the top 10 at the Games last year. Like we saw in the Central East, there will be some men not heading to the Games that probably would have gone in virtually any other region in the world.

With that in mind, let's get to some assorted topics before we move onto the predictions for week 4:
  • Taken in a vacuum, I don't have a problem with Dave Castro's statement (speaking for HQ I presume) that there will not be any wild cards given out this year. However, in context of this season, I'm not a fan.
    • Why even announce that wild card spots will be available (which they did earlier this year) if you are going to rule out that possibility before the Regionals have even finished? I cannot conceive of a scenario where wild cards would make more sense than they do for Sam Briggs this year. She is the reigning Fittest Woman on Earth, she had a single bad event in one of the most volatile events ever programmed at Regionals (1-attempt handstand walk) and she still finished fourth in a stacked region. If you're not going to use a wild card in that situation, then you're never going to use it.
    • Talent is so clearly bunched in a few regions (and has been for a few years). I can understand the argument that the regionals are set up with a limited number of spots in each region to increase drama and make things more exciting. However, I find it difficult to accept the argument that this system is ideal for finding the fittest athletes in the world. I get that cross-regional comparisons are not perfect, but I challenge anyone to argue that Graham Holmberg (4th in Central East) is not among the 40 best CrossFitters in the world. As it stands now, he is ranked ahead of the champions from 9 other regions. For Castro to argue that "the right athletes" are going to the Games seems a bit disingenuous. If you're just setting it up this way for drama, that's fine, but let's just call it what it is.
  • Although we have one week to go, the data from across all regions has allowed me to get a sneak peak at some interesting things from this year's regionals.
    • In terms of correlation with success across all Regional and Open events, it appears that events 3 and 7 are the top events at this point. I'll admit when I was wrong, and I was wrong on event 7. The top athletes are all crushing it, and it is damn exciting. Event 3 is a bit surprising, but again, look at the athletes who are doing well there, and they're usually dominating across the board.
    • On the other end of the spectrum, event 5 for the men actually has the lowest correlation with overall success. My guess here is that this is the one event this season that truly favors taller athletes, and so you are seeing some athletes with huge performances who otherwise are struggling. For the women, this event is not so bad, mainly because there are no athletes jumping 10-11 feet in the air and getting to the top of the rope in a couple pulls.
    • Not surprisingly, the two single-modality events (1 and 2) are among the least correlated with overall success for both men and women. Event 2 is slightly worse than event 1, but not by as much as you might think.
    • Events 4 and 6 are kind of middling in this respect. I expected event 6 to really bring out the top all-around athletes, but it might just be so grueling that it heavily favors the endurance specialists.
    • If we look at Open events in this context, 14.3 has the lowest correlation with overall success among Regional athletes (as it did for the entire Open field). On the other side, 14.4 was the highest correlation with overall success among Regional athletes (as it did for the entire Open field). In fact, it is basically neck-and-neck with Regional event 3 for the top spot across all events this season.
    • Some have suggested that results in the handstand walk might be correlated with success in event 4 (which has tons of handstand push-ups). It doesn't appear that way; ranks on those two events are not particularly correlated (52% for men, 44% for women - both of those figures are middle of the road this respect). The only combination of events that really stands out is events 1 and 7, which were 77% correlated for women and 68% correlated for men.
  • Last week I posted a chart and some statistics regarding the accuracy of my predictions (I should note that these are after removing athletes who withdrew prior to event 1). After week 3, the calibration plot looks about the same, but the mean-square error has dropped from 4.38% to 3.93%. For reference, last year's model was 4.43% and a model giving each athlete an equal chance would be about 6.40%. Below is the calibration plot (read last week's post for an explanation):

Alrighty... with all that out of the way. Let's get onto the predictions. This week, the only athlete for whom I made a manual adjustment to the model was Jason Khalipa. This year's events might not really favor him, but the guy has been so freaking consistent over the past 6 years that I felt he warranted special consideration.

With that said, here you go. Enjoy the final week of Regionals, everyone!

[Update 5/31: I've made a couple fixes to account for women's name changes since last year, as well as making the adjustment for Andrea Ager that I suggested in the comments a couple nights ago. I treated her as if she did not compete at Regionals last year, rather than as if she finished very low. Her low finish was due to a DQ in the OHS event, not due to a poor performance overall.]

Note that Africa only has one qualifying spot. All other regions this week have three.

Also note that the pictures look prettier this week because I'm posting from a Mac. Excel is terrible on a Mac, but at least it exports nicely to pictures.