Are Computers and "Big Data" Making Vegas Smarter? Some Insights from March Madness Data
- CitizenAnalyst
- Mar 19
- 8 min read
You'd have to be living under a rock to not feel the incredible rise of sports betting over the last 5-10 years. COVID almost certainly threw gasoline on the fire, and since most of us have friends or family that watch sports, you have probably noticed that at this point, betting has become almost ubiquitous. Though this feels like it's probably going to be a problem down the road, as with most things, there is good and bad mixed in. In the case of sports betting, perhaps the best "good" is that it allows the "wisdom of crowds" another avenue to flex its muscle. Many people who don't care about sports might have become exposed to these concepts for the first time during the recent presidential election, when prediction markets not only correctly predicted a Trump win, but also sniffed out that win almost as soon as the Georgia results started coming in (one of the first reporting states that night). For those that care about markets and behavioral economics, betting markets are utterly fascinating to analyze, so while perhaps fraught with social problems (corruption of the games, increases in gambling addition, etc.), sports betting markets as an economic experiment is no different.
One thing that I have been curious about is whether or not computers and big data have made it more difficult to make money betting on sports. The answer to me seemed obvious: of course! How could sophisticated computers and the engineers who build the algorithms on them not have figured it out by now?
I fortunately stumbled across a data set that has the betting lines from all of the NCAA tournament games back to at least 1985 (the first year the tournament expanded to 64 teams). So then I had a thought: why not compare Vegas' hit rate (which I define as what percentage of games the Vegas favorite wins) and the final score spread (compared to the predicted spread) from old NCAA tournament games to current ones? My thought was in 1985, and probably for a good 5-10 years after that, the influence of computers on betting was probably significantly less than it is today. This had to make the lines much more inefficient, and thus, it had to be easier to make money betting. Today, by contrast, anyone can access box score data from every college basketball game for the entire season and, if they wanted to, quickly become the next Ken Pomeroy (founder of course of the famous basketball analysis site KenPom). Access to this data and programs to analyze it should make the average bettor smarter, at least in theory.
To test this, I looked at first round games from the men's NCAA tourney from 1985 through 1994, and then again from 2019 through 2024. Eventually I'll do this for all years, and for all tourney games, but for now, since there's 32 first round games each year, I've got 320 games from the "pre-computer" era and 160 in the "internet / big data" era. That should be more than enough to constitute a decent sample size.
What I found from this was actually pretty surprising: Vegas does not seem to be getting any better at either predicting the winners of games, or the scores. It actually seems to be getting worse. Below is a table of Vegas' hit rate and then the standard deviation of the final scores relative to the predicted spreads. I'll show the data in chart form after that.

The table shows that in the first ten years after the NCAA tournament expanded to 64 teams, Vegas correctly predicted the winners of first round games almost 80% of the time. Additionally, their predicted spreads were off by about 6 points compared to the final score (so if St. John's was favored by 4, but won by 10, the "miss" here would be 6 points, just as it would be if St. John's was favored by 4 and lost by 2). This compares to the 2019-24 period where Vegas was only predicting the winner about 70% of the time. And similarly, spread "misses" were closer to 7 points on average across a typical tourney year's first round games, or almost a full point higher than in the pre-Internet era.
Now here are two charts showing the same data over time, along with averages from the pre-Internet and Internet eras. For ease of viewing, I've changed the colors of the bars to orange for the 2019-24 era, though it's the same series as the blue bars in each chart. Also note that 2019-24 is actually only 5 tournaments, since the 2020 tournament was canceled due to COVID.

Now here's the chart for the standard deviation of Vegas' "miss" with its spreads. To reiterate, this calculates the standard deviation (in points) that Vegas spreads "miss" by. So as noted above, for example, if St. John's is favored in a game by 4, and they lose by 6, the "miss" was 10 points. If they were underdogs by 2 and they win by 3, the "miss" would be 5. I then take the standard deviation of this "miss" across each year's 32 games.

Several possible explanations for why this degradation in predictive accuracy from Vegas might be happening.
First, we should remember that "Vegas" is actually generally just a market maker, and it's trying to set the odds so that it has roughly the same amount of money on each side. It's goal in theory is not to predict the winner or loser, or the score. It's trying to predict what the public thinks about those two items such that it will produce a similar amount of betting dollars on each side. The odds they set are such that they hopefully capture a chunk of the money from bets they take in regardless of the winner or loser (this is called their "vig"). So if an outcome has a 100% chance of happening (such as one or the other team winning a basketball game), in theory the odds of both teams winning and losing should add up to 100%. But if you do the math, they never do. That difference between what the implied odds of each team winning and 100% is the Vegas "Vig".
In that sense, maybe we're asking the wrong question: if Vegas is just trying to get a balanced book, who cares if their hit rates and spread "misses" are off? As long as half the money is on each side of the book, why should a bookie care if their favorites and spreads miss more of the time? The counterargument to this of course is that if Vegas was getting more accurate at predicting final scores, it would more easily allow it to take "prop bets" and run unbalanced books to make even more money. But more on this in a minute.
This brings us to Possible Explanation #2: the proliferation of sports betting has actually made markets less efficient (at least in a sense). The logic here goes like this: sports betting becoming ubiquitous has made it far easier to bet, and this results in more "rookies" entering the fray. This dilutes the significance of "pros" as a percentage of the total betting pool. This is essentially an argument that the current betting public is dumber than it used to be, whether because they do less homework, are less experienced, or both, and that the "dumb money" is potentially crowding out the "smart money." As a fervent believer in the "wisdom of crowds," I'm skeptical of this, particularly because Americans are absolutely obsessed with college sports. Football and basketball can be complicated, but they're not rocket science. Most sports fans can tell a good team from a bad one pretty quickly, after all. Thus, assuming people are dumb when wagering on zero-sum outcomes is almost always foolish, and that is likely to be the case here as well. This explanation is possible then, but I'm skeptical of it, particularly because of possibility #3.
Possibility #3 is that Vegas has gotten worse at predicting the winners and winning margins because there is a lot more parity in college basketball now than there's been in the past. Said differently, if there used to be a lot bigger gaps between 1st and 2nd tier teams and 3rd and 4th tier teams, it would be easier to predict who would win (though this doesn't necessarily explain Vegas' lesser predictive powers when it comes to scores and spread "misses"). There's good evidence to suggest that college basketball parity has increased significantly compared to the late 80's and throughout the 90's. This might be making it more difficult to not only predict the winners of games, but also to predict the correct score spreads with accuracy.
Consider the following chart, which shows the standard deviation and variance of SRS scores for each college basketball season going back to 1984-85. As we've highlighted in prior posts on college basketball, the SRS (Simple Ratings System) is a pretty solid way of evaluating a sports team (while we've only used this system for college basketball, it's applicable to any sport). As a reminder, SRS scores are made up of two components: the Margin of Victory (MOV) and Strength of Schedule (SOS). Margin of victory is the average number of points the team in question beats its opponents by. Strength of Schedule is the average MOV of all of the team in question's opponents (so how many points on average all of Team A's opponents won their games by). These two components are then added together to get an SRS score.
If we pull the SRS score for every team in every college basketball season back to 1984-85 then, we can look at the variance, standard deviation and average SRS scores for each season. Generally speaking, more parity in college basketball would mean lower variances and lower standard deviations. Higher standard deviations and variances by contrast would equal less parity (or more teams that are really good and more teams that are really bad). Degrees of parity exist when more teams are more (or less) equal in ability.
With all that said, consider the following two charts: the first showing the standard deviation and variances of SRS scores, and then the second showing 5 year rolling averages of each of those items, to smooth things out.


Both of these charts show that the standard deviation and variance of SRS scores across college basketball have been going down in recent years, though that may be starting to reverse again with conference consolidation. This year actually seems to have been the lowest level of parity in men's college basketball since 1994-95, which is pretty significant (though not surprising given our recent post on the SEC being historically good this year, and also how good the Big 12 and Big 10 appear to be this year too, at least as measured by SRS).
Why does this matter? Because if a hypothetical team A is more similar in ability to a hypothetical team B, it becomes more difficult to figure out who is going to win. More parity in college basketball therefore makes it harder to figure out who's going to win, and additionally, who's going to win by how much. Consequently, even if there was a lot of new "dumb money" in the market, if Vegas is losing its ability to predict spreads and scores because of parity, they're also going to be less willing to take prop bets or tolerate unbalanced books if the money is really flowing in on one side or the other, which we might have hypothesized about in Possibility #2 above if they were getting smarter at predicting. Thus, while difficult to prove as the main culprit, parity seems likely to have been at least a key source of the apparent decline in Vegas' (and by extension, the public's) predictive power in the men's NCAA tournament. It might also explain their unwillingness to take more prop bets today than it did in the past, and why it's seemingly easier to move college basketball lines relative to other sports (even college football), though depth of liquidity is no doubt a reason for this too.
The expansion and proliferation of sports betting markets is in some ways still very much in its early stages. In future posts, we'll expand on concepts discussed in this analysis and see if we can't get even more satisfying answers. In the meantime, enjoy the best time of the sports year this weekend!
Comments