Model Methodology - Mainstreet USA Polling

Over the course of the FAU/Mainstreet Research partnership, we decided to expand the scope of our 2024 project to not just conduct polling for the 2024 presidential election, but to also construct a model for it as well. Many people familiar with our work will remember our 2021 Canada Federal Election model, which predicted, within just a few seats, the size of the Liberal party, coming close but just shy of a majority.

Forecasting US elections presents different challenges than Canadian elections. Instead of having to project national and regional votes to small districts of around 100,000 people, the president is chosen from the winner of large states, with millions of votes in each. This is unique, as swings in smaller districts are harder to detect in national and regional polling, but swings in groups of millions of voters should be much easier to uncover. However, in the 2016 and 2020 Presidential Election, state level polling made many mistakes, thus introducing a very high error. National polls tend to be more accurate, but they are harder to forecast from, as the President is not determined by the candidate with the most votes, but the candidate with the most electoral college votes.

Most other US election models use a mix of State polls and National polls to build correlations between different states, which means that state polling errors, which have occurred now in the 2016 and 2020 Presidential Election, can cause huge misses across many states.

That is why FAU/Mainstreet has created a new model to address these issues. This new model is similar to our Canadian models, allowing us to project National vote down to the state level using demographics from our National poll. The key for the model is determining how different demographic groups in different states vote in relation to each other, and then projecting those groups at the state level from a National poll. Generally, the polls with the largest subsamples are National polls, which leads to better accuracy of subsamples, as some state polls may only have a small survey sample. If the state being polled has a large minority population, like Pennsylvania, it is difficult to get a clear sample of Black voters in a state poll that tends to have a smaller sample size. However, in a National poll you get a much larger sample of Black voters, but they are not representative of Black voters in Pennsylvania. The objective of the model is to relate how a sample of National Black voters is related to a sample of Black voters from Pennsylvania and use this relationship to determine how those Black voters in Pennsylvania are voting.

The model does this by using data from past elections. By using US Census estimates of the population that voted in each state, we can get an electoral composition for each state. From there we use exit polls to observe how demographic groups in each state voted. Exit polls are made on election day and are usually revised slightly once the results are finalized, but they are not perfect. However, since they are the largest sample dataset available with state-by-state demographic breakdowns, by combining the data from the Exit polls and the US Census, a least-squares analysis can be performed to get a more accurate representation of voting patterns within a state. Once this data has been generated, you can relate each demographic group in each state to how that group votes Nationally. Then, when a National poll is conducted, the model projects vote shares among the various demographic groups. Additionally FAU/Mainstreet will also conduct State polls for the Presidential Election. This will allow us to see how candidates are doing with different demographic groups and compare to what the model projects for those demographic groups in that state. If we continue to observe a demographic group constitently voting substantially differently from how the model projects them voting, we can make adjustment to our model, similar to how state polls influence other Presidential Election models. An example of this would have been Florida in 2020. While the model projected a certain result for Biden among Hispanic voters, if we had conducted state polling of Florida, and found Trump consistenly making even larger Hispanic gains, adjustments would be made in the mode to account for that.

Although this seems similar to an MRP, there are key differences. An MRP relies on massive sample sizes, numbering in the tens of thousands, usually taken over weeks of data, to build demographic profiles in each state (or district, as is the case for UK MRP projections) and then project vote shares from that. MRPs, even with their large samples, suffer from the problem of having unrepresentative samples in all of the areas they are projecting results for. While MRPs compensate for these unrepresentative samples by filling in gaps with groups from other samples, this change creates additional error, as while different voters in the US might appear similar, they might vote very differently. A white voter in the Midwest will vote very differently than a white voter in California, despite both areas having similar population sizes. Even within in the Midwest, a white voter from Illinois will vote very differently than a white voter from next door Indiana, as the voter from Illinois is much more likely to vote Democratic than the voter from Indiana. By replacing similar voters from similar groups, a large amount of error is introduced into the individual state by state projections. This problem is removed through this new method, as the partisan differences between different electorates in different states is already calculated by the 2020 exit poll data. While this model will miss a group in a specific state moving differently compared to the same group nationally, while an MRP might do a better job of capturing this swing, examples of this are very rare and only have minimal impact, specifically on the specific state where that group voted differently compared to their group relative to the US as a whole.

Another important fact about this model is that it is technically a “Nowcast”, meaning a model that does not use any economic fundamentals to project what the vote will like on election day. This is partly because, while economic fundamental models are able to predict future vote shares they are not able to forecast vote shares with different demographic groups, as those are far more likely to move differently between elections. Since elections are not decided by popular vote, and instead on how different demographics vote in different states, the use of an economic fundamentals model would be counter productive in building this model. Additionally, while there has been past correlation between economic fundamentals and vote share in Presidential elections, every year is unique, and depending on the economic indicators used, economic data could be either very good or very bad for the incumbent party. In 2020, if unemployment data was used, a model would predict a depression like wipeout for the incumbent party. However, if it had looked at stock market numbers, or disposable income, the incumbent party would be very strong. Similar contradictions can be found this year. If unemployment and stock market data is used, the model would believe the economy is very strong. However, we know from polling that roughly half of the US (depending on the poll) believes the US is in a recession. This impossible contradiction, caused by how partisans view the economy depending on if their preferred party is in power, means those past correlations between economics and vote shares are fading, and due to their small sample size (only one Presidential Election every 4 years) means that a model using economic factors might not be as reliable. Additionally, as the election approaches, the weights on the economic factor are eventually reduced to 0, and those models become, theoretically, exactly the same as the FAU/Mainstreet model, a “Nowcast”. Having a model extrapolate out what current polling would translate to in an election held today is extremely valuable, as it shows viewers where the race is at right now, and doesn’t require viewers to extrapolate back whatever economic factors are added to the model to calculate where the Presidential Election stands today.

By using the existing exit polls from 2016, we were able to project the US 2020 results, using the corrected 2020 exit polls instead of the inaccurate public polling from the days before the election. Lighter colours mean less error. Note, no adjustment was made to account for Evan McMullin’s candidacy in Utah in 2016, leading to the larger than expected miss in that state. Additionally, only states that had Exit polls (as completed by the National media) are shown. This was only an issue in 2016, as in 2020 Fox News completed Exit polls for all states.

The largest miss was Utah, as mentioned above. The next largest was Iowa, Ohio and Florida being 5 points more Republican leaning than we projected. However the model correctly predicted Trump wins in Ohio and Iowa, just on a smaller scale. All but three of the states projected were correctly called, with the exception of Florida (Biden +2 to Trump +3), North Carolina (Biden +1 to Trump +1) and Wisconsin (Trump +1 to Biden +0.6).

While this methodology works well to build a National model, it can also be used within a state to calculate how demographic groups voted in each county and congressional district. The data on how each demographic group votes in those smaller areas can then be compared to how they vote Nationally, and how each county and congressional district will vote can be projected from a single National poll. Here is the result of using the same 2016 exit polls, projecting the results to 2020, using the same methodology as before:

It is notable that the model did very well in Midwestern cities and suburban areas but did slightly worse in more rural areas. However, since those are small it leads to the relatively small errors as shown in the state level regression as before. It is also notable that the Rio Grande valley in Texas is darker, as Republicans overperformed the model in this area, but that trend does not carry over past the Texas state line into New Mexico and Arizona. However, it also underestimated the shift in Hispanics in Miami-Dade County, but did very well in other parts of Florida, such as Tampa, Orlando and Jacksonville. Additionally, the model underestimated Democrats in the Southern suburbs around Dallas, Houston, Austin, Raleigh and Atlanta, but not in other Southern Suburbs. The model did exceptionally well in Midwestern suburbs around Philadelphia, Pittsburgh, Detroit, Cincinnati, Columbus, Grand Rapids, Milwaukee, and Madison. As we did not do polling in 2020 for the US, there is no way to know what adjustments would have been made upon seeing state level polling, but using this model as a guide, and polling areas where things seem to deviate from long term trends and then polling specifically those places, adjustments can be made during the 2024 election, and interesting changes can be highlighted for subscribers to the model.

Over the course of the campaign, the model will be updated when Mainstreet/FAU releases a new National poll, and House and Senate models will be released later in the summer and updated with National polls. While we still do state polls, we might find a result that is different from the model. That is not a cause for concern, as it can mean the state poll is off, or it can mean the National numbers have moved. Once we get closer to the election, we will be polling Nationally almost daily, so there should be very little difference between the model and the state polls.