Boston Marathon 2025: Predicting the Buffer

An analysis of Boston Marathon qualifying standards and performance predictions for the 2025 race.

Will I run Boston 2025?

Update: Our model was accurate within 6 seconds of the 2025 Boston cutoff! (Predicted: 6:44 | Actual: 6:51) We'll continue using this methodology to help runners predict their odds.

Running has boomed, or rather sonic-boomed, since COVID. Singles are off dating apps and into running clubs. The former celebrity runners like Will Ferrell have been replaced with a chic crop including Diplo (who famously dropped acid during his first marathon and is now doing a national 5k tour jokingly called "Diplo's Run Club") who are taking the running vogue to new heights in this Zoomer world.

But what does this mean for those who have the dream of running the Boston Marathon? With the application window now closed for Boston 2025 and a record number of Boston applicants, will you actually get in and be running this April? You can enter your information to find out your odds here.

Image description
Figure 0: My (dismally) low odds of running Boston despite a 6:56 buffer.

I'm not the first person to try to answer this question. A few folks have already posted their predictions, and to do so they've done some awesome work collecting a lot of data from many Boston qualifying races for the 2025 window. While I find their methods credible, I thought it would be interesting to take my own, simpler approach. This led me to come up with a prediction of 7:38, and a range of 6:19 to 8:57, which is based on the number of applicants the BAA announced yesterday (36k). I'll explain in detail below.

The key insight that allows us to use a simple prediction is that cutoff time is highly correlated (Pearson coefficient of .94) to the number of applicants the BAA rejects, and even more so correlated (.98) to a transformation of that number and the field size (think of the ratio of rejected applicants to the field size as a proxy for competitiveness; see Figures 1 and 2.) This means that if we can estimate the number of athletes the BAA will reject, we can estimate the cutoff time.

Image description
Figure 1: Cutoff times, $C$, (in seconds) plotted with # of rejected applicants. You can see that the variance in the number of rejected applicants can explain most of the variance in the cutoff time. This means we just need to estimate the number of rejected applicants. Data from BAA.
Image description
Figure 2: Cutoff Times (in seconds) plotted with log-transformed # of rejected applicants and field size. By accounting for the field size and the nature of the applicants times, we can get a much closer fit.

Cutoff Determinsm

It's possible to calculate the cutoff time directly if we had perfect information. The cutoff time, $C$, is determined by solving the optimization

min$_{C}$

s.t.

$\sum_{g \in Gender, a \in AgeGroup} \sum_{e \in E}$ $\quad$ $I _ {[t(e) < Q(a(e), g(e)) - C]} \leq \kappa$

where

  • $E :=$ the set of all entrants; each entrant has a time, $t$, AgeGroup $a$, and Gender $g$. We overload the notation and allow $t$, $a$, and $g$ to be functions mapping the entrant to their time, age, and gender, respectively.

  • $Q :=$ a function that maps age groups and gender to a qualification time.

  • $\kappa:=$ the number of time-qualifying athletes the Boston Marathon allows to race via time qualifying; this is a parameter set by the BAA.

  • Field Size = $\kappa $ + (# charity runners); this is usually announced ahead of registration to be 30k. $\kappa$ and $|E|$ are announced after registration.

  • $r = |E|$ - $\kappa := $ the number of rejected athletes (a.k.a. Boston dreams crushed).

This parameter $\kappa$ is what makes predicting the cutoff time impossible, because even if we had perfect information about every applicant (which we don't), we don't know how many time qualifying athletes the BAA wants to have in a given year. However, we do have an upper bound: the BAA every year announces the field size, and it's safe to assume that $\kappa \leq$ Field Size. But how useful is that? Probably not very. Last year, the field size was 30k, but they only allowed around 22k time-qualifying athletes. This year that size is again 30k, and although that probably won't mean much, we can still use it to calculate some best-case and worst case scenarios.

Building a model

Since going through the first-principles approach of calculating the cutoff time seems moot (this is what others have tried), we can go back to simpler means by estimating $C$ from the number of rejected athletes, $r$ (we can find ways to estimate that number later).

If we build a simple linear model (which seems reasonable given the correlation of .98 we found earlier), we can get something like the Figure 3 below, which at least passes the visual sanity check. This is just a linear model that takes the number of rejected athletes and the field size as inputs.

Image description
Figure 3: We can see that the model does decently well at predicting the times when it knows the number of rejected athletes; the confidence intervals are estimated from the model error.

Some very passionate folks track the number of likely applicants this year (one popular estimate was 34,402), but even better is that the BAA has been announcing this number after registration closes: this year there were 36,406 applicants ($|E| = 36,406$).

We can probe with the model for varying ranges of $\kappa$ (historically, only 22,000 to 24,000 of entries are allocated to time-qualified runners, but it's fun to poke around outside that range too), which gives us a varying ranges for the number of rejected applicants. This is shown in Figure 4.

Image description
Figure 4: The model's predictions for cutoff times given the number of rejected athletes. To find a reasonable guess for this year's cutoff time, we just need to figure out where we are on that green line. If history is helpful, we are likely in the yellow region.

Evaluating our odds

It's likely that the number of qualifying athletes, $\kappa$ is going to be 22k again, given the race's love of charity spots (I personally feel these are a critical component of the event and give the race some much needed warmth; imagining a Boston Marathon with only time qualifications would turn it into an ultra-exclusive race that lacks the broader community spirit it’s known for—something closer to an elite track meet than a celebration of all runners).

At 22k time-qualifying spots, the model predicts a cutoff of 7:38, with a confidence interval of 6:19 to 8:57. If the BAA increases time-qualifying spots to 24k, the model predicts a cutoff of 6:44, with a confidence interval of 5:29 to 8:00.

In the coming weeks, we’ll see how close these estimates are. In the meantime, get off of Reddit, don't hold your breath, and go out for a run!