Poll aggregators are full of “What ifs?” They are particularly fond of the “If Trump, or Harris wins State x, what are their chances of winning overall?”. Here’s The Economist for example, looking at the chances of each candidate winning overall, if they win or lose selected states.
I like these conditional probabilities too. I’ve previously used them to show that Pennsylvania (which here you can see high up both lists) is in fact the most “informative” state, in the sense that if you wanted to know the overall national result, and could learn the result in just one state, then Pennsylvania would be the one you’d chose. By a mile.
Some use these probabilities to reason in a slightly different way though. They use them to argue that the state highest up both Harris and Trump’s lists is the one to focus campaign resources on.
That is, the reasoning goes, find the state that - if you win - will increase your chances of winning to the greatest amount, and - if you lose - will increase your opponent’s chances of winning the greatest amount. This state must be the one to focus all campaign resource focus, spend, and candidate time on.
And so, if you follow this reasoning, both candidates should just go and camp in Pennsylvania until election day.
The thing is, this is a mistake. In the way the models are set up, this conclusion doesn’t follow at all. It’s a somewhat subtle mistake - and reasoning this way won’t always lead you to campaign in the wrong place, but it’s a mistake nonetheless.
Why not Pennsylvania, only Pennsylvania and always Pennsylvania?
The reason why it doesn’t necessarily follow that each candidate should campaign in the “most informative”, or the “closest to the tipping point” states, comes in how these probabilities are calculated, and in turn, what scenarios the models are dealing with.
Let’s start with how the models generate the probabilities. What happens is that the model has various parameters that are unknown: e.g., individual and systemic polling error, drift in public opinion between now and the election, turnout differences and so on. Each of these will be given a probability distribution over the values, and then the model will be asked to generate thousands of “draws” from these distributions, and model the overall result.
This might look something like this: red is an national Electoral College victory for Trump, blue is a victory for Harris,
Next, we ask the model to sort these scenarios into those where Trump wins Pennsylvania and those where he does not. It will look something like this.
Now, we toss out the right hand side, and simply count the proportion of the space on the left where Harris still manages to win wins - it’s very low. And this is where we get conditionals like those given above: that if Trump wins PA, he has a 90% chance of winning the whole thing. This means that in the 500 or so simulations where Trump wins Pennsylvnia, then in 450 of them, he wins nationally too.
But all this means is that we’ve selected (that is, conditionalised on) scenarios in which Trump wins Pennsylvania. It does not ask why Trump won Pennsylvania. To make the point clear, here are two possible reasons why he might have won (there are obviously others as well).
Natural victory: Trump won Pennsylvania as part of a nation-wide trend. Some or all groups of voters begin to reject the Democrats in the last month before election day, and national polls, and those in every state begin to shift by just a few percentage points. In PA, this is enough to push his actual result ahead of the tiny 0.8ppt polling margin Harris has over him. He wins the state.
Engineered victory: Trump won Pennsylvania because he spent October campaigning there and nowhere else. This campaign activity had an effect in PA - again - just large enough to push him over the 0.8ppt polling deficit he currently holds, but it does nothing anywhere else. Perhaps it even harmed him elsewhere as the other states got less campaign attention.
Now, just by the way the models are set up, the scenarios set up in the “left-hand side” of the chart above will - almost exclusively - be scenarios where Trump won Pennsylvania for a natural reason. The models will be set to start with their best picture of the current race, and then those extra parameters will be exploring unknowns (polling errors, public opinion change in the last month, etc). Most models will not include in their variable parameters what kind of campaign diary the candidates will have in the last month before polling date.1 Instead, they will (implicitly) assume that both candidates run the same kind of campaign as candidates have run historically, and derive their voting patterns from those.
This is the main reason that the models are telling us that a win in Pennsylvania leads to a huge increase in probability for a national win. The scenarios being considered are conditionalised on a candidate has won Pennsylvania “naturally”, and so - by design - will be baking in a national trend that was big enough to gain Pennsylvania. In most of these, they will win many other states too.
And so, in turn, it’s a huge mistake to say: “Well then, if we go and concentrate all our attention of Pennsylvania, we will have a very high probability of winning overall.” For then, we are in effect conditionalising on an “engineered” win in Pennsylvania. And this is not the same, at all.
Magic wands and less than magic estimates
In fact, we can get a quick read on the difference. Take Trump’s current probability overall; again using The Economist’s model, it’s currently around 45% (and most of the other models agree).
Now, let’s present the Trump campaign with a magic wand, and when they wave it, it delivers them Pennsylvania, with its 19 EV, but - critically - doesn’t change anything else in the national picture or any other state. We can roughly see what the incremental gain in probability is: just by seeing what chance he is currently being assigned for Trump to gain only >251 EV (that is, 19 EV less than the higher threshold of >270 he needs for a normal victory. And the answer here is 57%.2)
So, PA “via a magic wand” gets Trump from 45% chance of victory to ~57%. And this magic wand scenario - while not a perfect reflection of an “engineered victory” - is a lot closer than the scenarios that the models are running.
As such, to work out exactly what the payoff of Trump campaigning in PA might be, you can multiply this +12ppt victory probability by whatever probability you assign to a “campaign just there” strategy managing to deliver the state. Perhaps this is quite high - in which case it’s a reasonable strategy overall; maybe more so than the alternative of flying between all 6-7 swing states and trying to shift the dial ever-so-slightly in all of them.
And in fact, I think this comes out as a pretty reasonably strategy for Trump. PA is so essential for him, a +12ppt chance of victory bonus is
What about Harris … in Texas?
Harris is in a fairly similar situation to Trump, but she also has alternative targets. Just quickly working out her chances with a “magic wand” 19 EV, you get a 67% chance of victory (up from 55% without it). And a boost of 12ppts is very worth having - perhaps we should indeed camp out there. Particularly, if - as currently - Vice President Harris doesn’t seem to be anywhere very much.3
But there’s another strategy. There’s another two states out there that are (just) within reach, and would raise chances by a lot more - even if they are harder to get. They are Florida (30 EV) and Texas (40 EV). If she got one of those via a magic wand, she would increase her chances of victory by +18ppts and 23ppts respectively.4
Now it’s a simple (not easy) calculation. Currently she is adrift 4 points in Florida and 6 points in Texas. Does the presence of a full-on campaigning Kamala Harris and Tim Walz in either of those states, give enough of a chance of closing those gaps to be worth the 30 and 40 EVs, and the 18ppt and 23ppt boosts in victory chances, respectively?
My suspicion is that the answer is a fairly clear no - that both of these “one-state” strategies are super-risky, and that neither of them has enough of a chance of success to be worth more than a more balanced strategy (campaigning across the swing states, and attempting to induce enough national momentum that “Reason 1” conditions start to come back into play). But they’re intriguing nonetheless, and - if anything - are more justified than a “just PA” strategy.
At least, once you realise that campaigning in one place only raises the probability of an engineered victory, not a natural one.
Vice Presidential picks
It’s worth noting that the very same mistake is being made by those who reason - from the fact that a victory in Pennsylvania takes their model from a 55% chance to a 92% chance for victory - that Harris was insane not to pick Josh Shapiro (the governor of the state) as her Vice Presidential candidate. This is wrong. At its best is another kind of engineered victory in PA, not a natural one.
And as such, even if we make the extraordinary assumption that VP Shapiro guarantees victory in PA, it would only raise her chances to 67%, not to 92%. Very useful of course, but a) not decisive, and b) the selection of Shapiro won’t raise her PA chances anywhere near that much, and may in fact have damaged them elsewhere.
Overall then, be very wary of people using these “conditional” probabilities to argue for actions in the campaign. The fundamental problem is that these conditional probabilities are premised on a certain set of dynamics in the race. So when you take action based on them, you’d better be damn sure that these actions are - in themselves - changing these dynamics in some important ways.
Massive changes to campaign strategy makes this mistake. Choosing VP candidates makes this mistake. Once you start looking, you find it almost everywhere.
And this is mostly for practical reasons. Even if you did set such variables in a model, you would need to know how state polling would move as a result. And we simply don’t have enough Presidential election data to analyse what the “signal” and “response” - we are considering an event that happens only once every four years, and have no counterfactual comparisons to make.
This isn’t quite right - we should really be excluding PA from our model altogether, since there will be correlations - but it’s pretty darn close.
A lot of Hilary Clinton vibes as we come into the last month. Where is she?
These are the increase in chances given by “engineered” victories, obviously. If they are “natural” victories, they increase Harris’ chances to as near 100% as makes no difference.