Context, Chance & Confidence

Our approach is intended to avoid three broad traps people commonly fall into, both when relying on intuition and when running naive analysis: neglecting context, misunderstanding chance and holding opinions with too much confidence.

At each step in our analysis we pay attention to the context for performance. This ranges from high-level considerations such as homefield and quality of competition, to details such as a 3-yard gain meaning something different on 3rd-and-2 than on 3rd-and-4. By doing this we are avoiding the well-known tendency to discount situational factors when explaining others’ behavior. Adjusting statistics in this way provides a much better look at actual performance.

Perhaps the biggest trap we avoid is “outcome bias”, the tendency to evaluate a decision by its ultimate outcome as opposed to the quality of the decision given the information available at the time. Our goal is to project future performance so it is critical that we not over-emphasize wins and losses given the wide range of chance events that help determine them. Our chief way of doing this is to weigh performance statistics by their predictive ability. That is, their ability to predict out-of-sample performance rather than describe in-sample performance. The canonical example is recovered fumbles, which greatly influence the outcome of games, yet are completely random. Because they are random they have no predictive power, and any stat heavily influenced by fumbles recovered (or many other chance events) will carry less weight in our model than in models based on descriptive analysis.

The most unique aspect of our ratings is that they grow in confidence as the season progresses. This merely reflects the amount of data on which they are based. This approach prevents the rankings from either under-reacting to early-season performance due to pre-season expectations, or over-reacting due to reading too much into a handful of games. Overall it keeps us from being overconfident, one of the most important and ubiquitous judgment biases.

General Approach

We use only four statistics – one each for rushing, passing, scoring and play success. Rather than creating esoteric new stats (not that we aren’t occasionally impressed with those), we focus on “cleaning up” these relatively basic stats and then finding the appropriate weight for them in our model.

Our model is extremely “bottom up”. Meaning it is built as much as possible from play-level data rather than game-level data. This is intended to take advantage of there being more information about team performance at the micro level, and to avoid mistakenly or unknowingly including noise in our measures.

Some Specific Steps

We begin with play-by-play data. Inputs and considerations include:

  • We adjust all stats for homefield.
  • We discount plays for game situation (e.g., score difference and time remaining).
  • We norm each team-game statistic for opponent and league-wide performance.
  • We weigh each statistic by its ability to predict out-of-sample performance.
  • We discount historically distant weekly performance for non-stationarity.


One of our missions with these rankings is to proselytize better, unbiased reasoning and analysis. It matters in far more than football.