Fantasy Football Stats: Predicting Performance Using Regression
Sunday, September 5th, 2010In past seasons, to predict player performance I used a very rudimentary recommender system. For each player at each position, I would compute a z-score. This is basically a normally performance number between (-2 and 2). Then, I could use a simple mechanism to predict the following season’s performance. It worked well in a few cases: it was fairly good at finding up-and-coming players and identifying strong performers. However, these players were usually well known anyway.
This year I am taking a different and more complex approach to prediction. I’m going to do some basic non-linear regression. Basically, I’m computing the non-linear “line of best fit” for each player. This is best explained visually (click the image to enlarge)
The biggest problem is finding the right curve to “fit” the data to. The ones you see above are for third degree polynomials, so they have a characteristic wave shape. I tried a variety of other curves: linear, logarithmic, second degree polynomial, fourth degree polynomial and fifth degree polynomial. For example, here are the same three players but with the fifth degree polynomial regression:
The fifth degree may fit the data in a way which I don’t want. For example, look at Drew Brees above. The guy just won the superbowl but if I was to trust the regression, it tells me that his performance is going downhill. The model fits the data “too well”. So I’m going to opt for the third degree polynomial regression.
After I run the regression on all players, I can use this data to create a matrix of predictions. For each game this season, it will generate a prediction of the number of points they will score. My worry is that the variance is so large (our regression gets an r-square of at most .15) the predictions mean nothing.




