Why Do So Many Statistical Models Disagree?

With the current COVID-19 crisis in full effect and so much being made of predictive models or simulations, I’ve heard many people of many different walks of life articulate many different variations of a very simple idea:

“Why are there so many models regarding the infection rates and death rates from the coronavirus, and why don’t they all agree with each other?”

person sitting on chair holding iPad
Photo by Adeolu Eletu on Unsplash

It’s not hard to be bewildered. On the CDC website, for example, there’s a chart comparing 16 individual forecasts:


CDC.gov, Accessed May 26, 2020

FiveThirtyEight, meanwhile, has a chart comparing nine models:

FiveThirtyEight, Accessed May 26, 2020

This forecast hub from Reich Lab from the Department of Biostatistics and Epidemiology at the University of Massachusetts Amherst has a chart that combines over 20 models:

Reich Lab, Accessed May 26, 2020

All of these combinations of models tend to show a range of estimates for various statistics. The above images all focus on the projected number of deaths, and as you may be able to see, one model (JHU, which is Johns Hopkins University) suggests a far more rapid rise in deaths than any other model. By contrast, the models from the US Army Engineer Research and Development Center and the Georgia Institute of Technology are among the most conservative, suggesting that the number of deaths four weeks from now will only grow by about 10,000+ instead of the 70,000+ Johns Hopkins is suggesting or the 20,000+ other models are suggesting. (Let us hope the more conservative models are correct!)

What’s important to understand is that all of these models are:

  • Built by really smart people who have significant training in statistics and data science
  • Based upon huge amounts of historical data
  • Based upon knowledge of prior epidemics for similar illnesses
  • Based upon a modeling technique that is well-understood within the field
  • Based upon differing assumptions about human behavior

That last point is really important, because the CDC website goes into considerable detail explaining those assumptions, and variations in presumed behavior provide a powerful explanation for the different scenarios these models predict.

For example, the Georgia Institute of Technology model assumes that interventions are making a big difference in curbing the spread and fatality of the virus:

CDC.gov, Accessed May 26, 2020

Whereas Johns Hopkins University suggests that once shelter-in-place restrictions are lifted, the virus is going to spread pretty rapidly:

CDC.gov, Accessed May 26, 2020

So, one model is what we’d call optimistic, and the other is what we’d call pessimistic. Both present a fairly extreme case which is probably (hopefully?) not grounded in reality, but which does provide guidance for the future if its assumptions happen to be correct.

You might argue that the truth probably lies somewhere between these two extremes, and that’s precisely what the other models set out to show. Some account for state-by-state re-openings, some include mobility data, some predict gradual easing of social distancing and some ignore the interventions altogether to utilize other predictive criteria.

And this provides us with an excellent jumping off point for understanding why predictive models and forecasts are both useful and frustrating, because they present a useful glimpse into a possible future, but they also can become bewildering and downright aggravating if the future they foresee doesn’t reflect what actually happens because of something they didn’t (or couldn’t!) anticipate.


Let’s begin by defining what a model is, because the term is actually quite imprecise and describes a whole spectrum of conceptual designs that include thought projects, explanations of processes and detailed algorithms that derive from mathematics or statistical analysis. Simply put, a model is an attempt to portray a complex idea using a simpler one.

Consider for a moment the distinction between the Taj Mahal and this Lego representation of it:

Taj Mahal (Edited).jpeg
Source: Wikimedia Commons
Source: Lego

The Lego model is designed to capture and evoke many of the significant details of the actual structure, and at nearly 6,000 pieces, the model kit offers an extremely complex representation of one of the most famous and beautiful buildings in the world. But even so, there are huge compromises that have to be made for the Lego model to exist. Not every element is precisely at scale, and the focus is on replicating the exterior, though with a significant reduction of the fine detail that can be seen on the exterior of the actual structure.

The model is a great representation of the Taj Mahal that’s instantly recognizable as a replication of the real destination, but it’s nowhere near accurate in anything beyond a very general replication of the building’s shape, structure and appearance. Part of this is because there’s an assumption on the part of the Lego Group that kit builders want general accuracy but don’t need every detail to be perfect replications of reality (which is definitely true), but part of this is also because there’s an assumption on the part of the kit builder that this Lego model has to have some studs and bricks showing to be recognizable as a Lego recreation in the first place. These assumptions shape what the ultimate model will look like and explain why it doesn’t need to be perfect. (After all, even built at 1:1 scale, a Lego Taj Mahal wouldn’t be a perfect replica – it would have its own distinctive qualities and quirks that would only make it appear accurate from a very long way away.)

The types of models we use to explain or predict reality function in the same way. We don’t want to try to account for every facet of reality because we’ve already got reality, and it’s terribly complex! We instead want to simplify things so we can try to distill them down and understand them.

Many of us understand this instinctively when hear or we make comments like, “I’ve found that 20% of my customers are responsible for 80% of my sales.” This dynamic – often referred to as the Pareto Principle or the 80/20 rule – is a simplistic model which suggests that most service organizations have a small group of core customers and expend great time and effort serving customers they’ll rarely see again. This principle has been found to be pretty empirically sound in general, but it doesn’t actually describe reality that well. Many organizations find that 80% is a good approximation for their core customers’ contributions, but it’s probably a number a little below or a little above that. The 80/20 rule is great for evaluating ideas in a meeting, but it’s not tremendously useful for making decisions where a more specific number is required.

For now, let’s leave aside the theoretical and operational models that are often descriptive of a process or series of behaviors, as these are often academic and abstract, and they tend to reflect a combination of both qualitative and quantitative research combined with a considerable amount of thought and scholarship.

Let’s instead focus on the models which result in a mathematical expression of something, which are often built through univariate or multivariate statistical methods including regression, structural equation modeling (SEM), cluster analysis and factor analysis. I would suggest these can be broken down into a few different categories by application:

  • Descriptive models locate connections between existing data to provide a deeper understanding of what the data have to say. Segmentation systems, perceptual models and factoring to collapse many variables into a handful of constructs fall under this category.
  • Predictive models tend to focus on utilizing independent variables to predict the outcome of dependent variables. They’re often based on known data but not strictly tied to time. These can range from very simple regression equations to sophisticated forecasts to quite complex models for recommender systems like you’d see on Amazon, Youtube or Netflix.
  • Longitudinal models tend to utilize existing time-series data to predict a curve, which can be useful for forecasting or anticipating the natural decline of a product or service.
  • Causal models tend to instead focus on utilizing independent variables to understand or manipulate the outcome of dependent variables. This is what you often see in medical studies where the effects of interventions need to be understood.

(This well-written summary from Kevin Gray is worth a bookmark if you need to refer back to some of the terminology or concepts above, and he has a good piece on models as well.)

With so many types of models and applications, we can already begin to see why different models from different teams of researchers might differ pretty widely in what they describe. But there’s even more to it than that. Models will perform differently for a whole host of other reasons, including:

  • They may have significantly different data used in their development, resulting in exaggerations of certain aspects of their results
  • They may have significantly different assumptions used in their development, resulting in further distortion of their results
  • They may utilize significantly different statistical methods (or incorporate other methods such as machine learning), resulting in even further distortion of their results
  • Some models may be higher or lower in quality than their peers, resulting in idiosyncrasies which are difficult to explain without spending a lot of time trying to understand the models themselves

That’s a lot to consider! But we can understand it pretty simply if we return to our Taj Mahal example above. Let’s consider the following models:

Source: CubicFun
Source: Wrebbit 3D
Source: Papernano

Each of these models offers a representation of the Taj Mahal, but each is quite different! One isn’t set on a square foundation. One exaggerates the size of the foundation and simplifies how many arches there are. One makes the towers on each corner much fatter than they should be. All four of them are different colors. None does a good job of representing the domes faithfully.

And yet… these models are all obviously of the Taj Mahal, aren’t they? And even though none is as detailed or intricate as the Lego model, they all do a fairly good job of representing the general idea. We can see distortion in the colors, shapes, scale and other components of the representation, but we could examine all of these models together in aggregate and get a pretty good understanding of what the Taj Mahal looks like by comparing their similarities.

And that’s where we come to what is often called an ensemble model, where several different models are combined to attempt to understand the most likely truth. It’s sort of like what Rotten Tomatoes or Metacritic do for movie reviews, taking varied methodologies and results and combining them together to compute a more likely outcome. And while an ensemble approach is itself burdened with many limitations, the reality is that it often produces a range of scenarios where the extremes are more obvious and the probable results tend to be reflected by many models.

Typically, ensemble models are useful for examining chaotic systems that have some sort of directional progression, like storm systems or hurricane trajectories or polling data or infection rates. They can be used to determine outcomes which are the most probable and which represent the less probable extremes. (While this may sound a lot like a Bayesian approach, it’s similar in application, but different in execution.)

Where ensemble approaches tend to be the strongest is when they incorporate different models utilizing different data sources to predict the same outcome. They are also a common tool in machine learning, where techniques such as stacking, boosting, blending and bagging allow a data set to be recombined and manipulated in different ways to provide stronger predictive accuracy. The end result is that you’re not relying on one set of assumptions to guide a single probable outcome, but a variety of sets of assumptions that allow you to see the range of results that tend to bunch up around the most probable course.

(That is, of course, unless you’re an unusual thinker like President Trump during Hurricane Dorian in September 2019, where he clearly rejected the probable models and added in his own extremely pessimistic prediction with a black Sharpie marker. But as it happened, the ensemble models were right about the storm’s eventual path, and Alabama was spared any ill effects.)


Should we trust statistical models? That’s a big question, and the answer is generally going to be a “yes, but…” unless a single model is going to be used to justify some consequential decision, in which case the answer will flip to a “no, unless…”.

In the “yes” scenario, we need to remember what a model is supposed to be used for – to help us to develop an understanding of a complex idea. A good model helps us to isolate what’s really important and identifies what we really need to understand to get the gist of an idea. But even good models require refinement, and it’s crucial that we understand the assumptions that go into creating the models before we accept their results.

In the “no” scenario, we should be quite wary of accepting a single model’s prediction if we’re going to be making decisions based upon it that go beyond a broad understanding of what it represents. In that case, we need to review multiple models or take an ensemble approach to combine several models together. We also need to be wary of being partial to the model that gives us the most favorable outcome or which suits our biases best – instead, we need to prepare for what’s probable and have a contingency plan in place for what the extreme cases suggest.

red and white flag under blue sky during daytime
Photo by Mark König on Unsplash

Consider the weather for a moment. The field of meteorology has long been engaged in utilizing historical data and forecast models to attempt to predict the weather, and this has evolved over time from offering fairly general regional forecasts to offering hyperlocal minute-by-minute forecasts. Considering time, resources, technology and brainpower have gone into making weather forecasts more precise and accurate, and yet they’re still wrong a non-significant amount of the time, and forecasts from different sources often vary in what they predict. But how you utilize those weather forecasts is entirely different based upon the decision you’re going to make.

If you’re just wondering if you need to bring an umbrella with you on an outdoor excursion, a single forecast will do, as it’s more likely to be correct than not, and the decision you’re trying to make isn’t of any great consequence. If you take an umbrella and there’s no rain, no big deal. If you decide not to take an umbrella and there’s some sporadic rain, you might, at most, consider a different weather source for your next outdoor adventure (or simply decide you bring an umbrella along in the future regardless of the forecast, just in case!).

But let’s say you’re planning an outdoor event and you need to know whether or not to move it indoors because there’s a possibility of heavy rain. In this case, you’re going to want to consult multiple forecasts and have a good understanding of what the most probable outcome will be. If one source is predicting all-day rain, another’s predicting no precipitation, and most are predicting occasional showers, you’ll want to be prepared for things to get wet at some point during the day. Even though the optimist in you might want to chance that the extreme case of all-day sunshine is possible, the pragmatist in you should accept that it’s probable your event is going to need to move inside at some point.

In the end, the fact that these models predict different outcomes isn’t an indictment on their quality. No model will ever predict the future completely accurately because the systems governing actual outcomes at complex and chaotic. But we can at least take solace in narrowing things down to what’s most likely to happen, and that allows us to anticipate how we’ll respond much more capably than if we’re simply guessing based on our own experience.


We hope this article has been helpful to you, and we want you to know that we’re here to be a resource however we can be on anything you’d like to know about marketing research!

Please feel free to check out our other articles, watch our Youtube Channel, connect with us on LinkedIn or Facebook or to contact us. We’d love to hear from you!