Personal Website for Tom Hayden

Posts Tagged ‘fantasy football’

Fantasy Football Stats: Predicting Performance Using Regression

Sunday, September 5th, 2010

In past seasons, to predict player performance I used a very rudimentary recommender system. For each player at each position, I would compute a z-score. This is basically a normally performance number between (-2 and 2).  Then, I could use a simple mechanism to predict the following season’s performance.  It worked well in a few cases: it was fairly good at finding up-and-coming players and identifying strong performers. However, these players were usually well known anyway.

This year I am taking a different and more complex approach to prediction. I’m going to do some basic non-linear regression. Basically, I’m computing the non-linear “line of best fit” for each player. This is best explained visually (click the image to enlarge)

The biggest problem is finding the right curve to “fit” the data to. The ones you see above are for third degree polynomials, so they have a characteristic wave shape.   I tried a variety of other curves: linear, logarithmic, second degree polynomial, fourth degree polynomial and fifth degree polynomial.  For example, here are the same three players but with the fifth degree polynomial regression:

The fifth degree may fit the data in a way which I don’t want. For example, look at Drew Brees above. The guy just won the superbowl but if I was to trust the regression, it tells me that his performance is going downhill.  The model fits the data “too well”. So I’m going to opt for the third degree polynomial regression.

After I run the regression on all players, I can use this data to create a matrix of predictions. For each game this season, it will generate a prediction of the number of points they will score.  My worry is that the variance is so large (our regression gets an r-square of at most .15) the predictions mean nothing.

2010 Fantasy Football

Thursday, August 12th, 2010

It is August now, which means I start preparing for this year’s fantasy football season. Now that my old UM blog is defunct, I am going to bombard all of my personal blog readers with insights into this year’s fantasy football season.  My insights are mostly academic, I’m interested in statistical analysis and developing machine learning algorithms for this setting. Some of my posts will be fun (and or interesting) findings and others will probably be more mathematical.  All my data is from NFL.com for the 2001-2009 seasons.

Here’s a fun plot that tells you basically nothing but is cool to look at. You can see some interesting patterns – the spacing between points is because of the point system (teams tends to get multiples of 7s and 3s).  That outlier on the lower right is Tom Brady’s 59-0 destruction of Tennessee in 2009.

That looks cool but it doesn’t really tell us much information of use. What about the numbers?  The mean score for a home team is 21.91 and for an away team is 19.39.  That’s almost a field goal difference.   Both home and away scores have the same standard deviation: exactly 10 points.  Does this have any insight into your fantasy team? Not really: your players all play the same number of home and away games.  Overall, you may get a bump when you have more players at home than away but not necessarily.  Interestingly, the scores probably aren’t correlated – so if one team is having a big day does not necessarily correlate to the other having a big day.  And conversely, they’re not anti-correlated either.  (Correlation coefficient -0.02 p-value=.25).

In case you’re curious, here is what the distribution of scores for home and away teams look like:

Both are pretty skewed to the right and have a strange drop from 10-15 points. This is a little surprising, you would expect many teams to end with a score of 14 but I guess teams will try to squeeze in a field goal or something. I’m not sure about how to explain this.

Fantasy Football Linear Programming

Saturday, November 28th, 2009

I go through the analytics records for this website from time-to-time just to see how people get to my site or what search terms are popular.  Today, while going through, I found I’m getting hits for:

1- “How Tough is Mas-Colell”. Answer: Tough but feasible.  See my previous post on the standard microeconomics textbook for PhD students.

2- “Fantasy Football Linear Programming”. This one is interesting. I run a mailing list (that has no activity), where people can talk about fantasy football analysis techniques.  However, nobody on the mailing list has used linear programming (to the best of my knowledge). I wonder – are people using linear programming to determine a fantasy football roster? If so – awesome.

For those of you who don’t know – linear programming is a technique in optimization. In words, it is something like this: You have something that you need to maximize (or minimize), like profits for a company or points for your fantasy football team. You have a set of “constraints” – i.e. limitations on this maximization. Things like, for a company, you can only produce a limited quantity of some good or producing one good decreases the production of another good, and so on. I’m not sure how this would go back to fantasy football, though – like what would this constraints look like?

My best simple formulation of a linear program for fantasy football is something like this:

Maximize Potential Points!

Subject to:

  • Limited number of positions (i.e. only 2 quarterbacks)
  • Players are taken by other teams in the draft.
  • Different Bye Weeks. (i.e. you don’t want all players with the same bye week, usually)
  • Potential injuries and/or players with chronic injuries (I’m looking at you, Clinton Portis)

Perhaps there might be a way to develop some optimization technique that uses linear programming. The formulation of this program is feasible – but I’m not sure about:

  • Computational limitations. The calculations for this optimization may be unreasonable – i.e. this may be an “NP Hard” problem.  I’m not sure about the complexity, yet.
  • Points calculations. This is what I’ve been mostly working on the past two years – developing a good prediction algorithm that can account for variation in performance. I’m up to about 50% but this needs to be improved.

Links

My Blog - I finally gave in and created a blog where I can post about whatever I like.

My Professional CV - This site has all of the relevant professional links about me; go here if you're interested in my academics.

Fun SI Projects Using Bidding Networks to Search for Exposure in Auctions - Auction 73 Case - This is some work I did in Fall 2008, as a final project for my Networks course at SI. I'm currently trying to see if this is publishable.

Technological Diffusion with Compatibility - This is based off of a model presented at one of Umichigan's STIET lectures this year.