In past seasons, to predict player performance I used a very rudimentary recommender system. For each player at each position, I would compute a z-score. This is basically a normally performance number between (-2 and 2). Then, I could use a simple mechanism to predict the following season’s performance. It worked well in a few cases: it was fairly good at finding up-and-coming players and identifying strong performers. However, these players were usually well known anyway.
This year I am taking a different and more complex approach to prediction. I’m going to do some basic non-linear regression. Basically, I’m computing the non-linear “line of best fit” for each player. This is best explained visually (click the image to enlarge)
The biggest problem is finding the right curve to “fit” the data to. The ones you see above are for third degree polynomials, so they have a characteristic wave shape. I tried a variety of other curves: linear, logarithmic, second degree polynomial, fourth degree polynomial and fifth degree polynomial. For example, here are the same three players but with the fifth degree polynomial regression:
The fifth degree may fit the data in a way which I don’t want. For example, look at Drew Brees above. The guy just won the superbowl but if I was to trust the regression, it tells me that his performance is going downhill. The model fits the data “too well”. So I’m going to opt for the third degree polynomial regression.
After I run the regression on all players, I can use this data to create a matrix of predictions. For each game this season, it will generate a prediction of the number of points they will score. My worry is that the variance is so large (our regression gets an r-square of at most .15) the predictions mean nothing.
To have the student conjecture about network properties, I generated a graph of the global structure using GUESS, developed by Eytan Adar at Michigan.
Here is another view of the same network laid out using the Fruchterman-Reingold algorithm.
This is one of the coolest visualizations I think I’ve ever done – and the structure reveals quite a bit about the music industry. Where do you think the popular artists are located? How come you’ve probably never heard of any of those guys that are in the outside “ring” in the above visualization?
The Arora and Barak text gives very detailed proofs of computational complexity topics (complexity classes, P, NP, turing machines, etc). They don’t spend much time on languages, alphabets, or basic concepts. For these topics the Sipser book is phenomenal. Part 1 of the Sipser book feels like a linguistics textbook.
In his book he proves Theroem 1.25:
The class of regular languages is closed under the union operation.
A regular language is one that is accepted by a finite automata, a finite state machine. Think of it like this: you are a finite automata. When you hear someone speak, you construct meaning out of the arbitrary utterances. Take the word “automobile” for example. When someone says that word, you hear:
aw-tuh-moh-beel
So, we define an alphabet to be the set of all possible arbitrary sounds (there’s probably a lot!). Then we define a regular language to be a collection of combinations of these sounds that we “accept” — that have some meaning to us. For some people the word “automata” might not even be in their language, technically speaking, I argue that no two humans have the same language. I can converse with most english speakers, so it is likely that the intersection of my language and others is large.
Back to his proof. His claim is that if A and B are regular languages, then the union of the two is also a regular language. Let’s take two different languages: english and spanish. I won’t go over the details of his proof in this blog post (it’s good), the basic details from the proof are:
Construct a machine M to recognize both english and spanish (i.e. a bi-lingual person)
We can construct their set of “states” as a combination (cartesian product) of both languages. We assume the alphabet of both languages is the same (in this case, they’re very similar)
Our machine (bilingual person) can accept a word construct in whole as an english word or a spanish word but not a combination of the two (no spanglish!)
To prove the claim, we prove existence of a bi-lingual person. This is such a fun application!
I am not a linguist, so maybe my musing are way off mark. Please drop me a comment if they are; I see lots of connections between what we do in computational complexity and what linguists do!
I spent the last three days at a Teachers Workshop in the Maine School District on a pedagogical tool, problem based learning (PBL). Those of us in higher education (especially engineers) are familiar with the subject, although you probably call it something else. Briefly, the concept is: teach some material by presenting a “messy” problem to students. This provides motivation and you can build lessons and discussions around the subjects they “need to know” to attack the problem.
This pedagogical tool is popular in engineering schools but often poorly implemented. The first course that comes to mind, for me, is a course on algorithms. At the beginning of a semester, give the students a setting where the solution is an NP-Hard algorithm. Most of the students will probably struggle with the problem, until you cover basic complexity and approximation algorithms. Perhaps they could use a randomized algorithm to attack the problem. This may motivate them to pull up conference proceedings and read more about the problem and approximations. You could pick a problem like the “School Timetable” problem or Multi-Commodity Flow. With undergraduates, you want to help them avoid this problem with their future employer [ From the Garey Complexity Book ]
On a slightly unrelated topic – at the end of the workshop, one of the facilitators played a video clip I have seen at least 2 times before in class at Michigan’s School of Information. You can watch the video below:
In the context of talking about problem based learning, I can understand why the video is interesting. If wepropose open-ended messy problems (like Ideo’s shopping cart problem) to students, we can motivate them to think abstractly and creatively about problem solving.
I do have a problem when people show this video to students, especially undergraduate and/or high school students to motivate them to become engineers, designers, and/or researchers. My problem is this: with high probability they won’t be working at Ideo or a company that designs like Ideo. If they go into engineering, they’ll be working for an {engineering/IT/manufacturing} firm, a start-up, or a University. General Motors or Google probably won’t let you have an airplane wing above your cubicle or let you hang your bike from the ceiling. The design process is more measured and your research/lit review/observations take far longer than a day. But it may still be an interesting, demanding, and creative process. Then, when these students start working at these companies or work towards a degree, they’re going to be disappointed when it is not as sexy as we told them it was.
Let’s not lie to our potential future engineers, we’re just going to disappoint them. There are lots of really cool engineers and projects out there that are not as sexy as Ideo but require the same tools. How about one of the engineers that designed the Chevy Volt? What about a Professor that works on a Cyclotron or the CERN project? Lets get students involved in engineering and design for the right reasons – not because it is sexy.
If you’re not familiar with the current Google/Verizon Net Neutrality Drama – read this good and brief article by Jonathan Zittrain. As an aside, I heard him debate Bruce Schneier in an Oxford-style debate some months ago on the topic of, “Is the Internet Security Threat Overblown?” He is a professor and I’m not sure if he has a J.D. but he certainly talks like he does. Schneier really knows his stuff and I think Zittrain won the debate, even though I initially disagreed with his side.
Back to the drama. I find the narrative of this story more fascinating when you consider the results of the 2008 FCC 700mhz spectrum auction. In the super-valuable C-block of the auction, Google convinced the FCC that if the total amount paid was over some amount (it was like $4 billion) then “open-access” provisions would apply to that band. Knowing that Verizon was really the only party capable of winning this spectrum (and Verizon had clearly signalled interest), Google bid the auction up to the reserve $4 billion pricetag. Then, they dropped out, letting Verizon win the spectrum. This mandated that Verizon maintain some degree of platform neutrality on their future wireless platform. Way back in 2008, Google and Verizon made an uneasy alliance on the wireless side, even though if you read their filings in the Federal Register, they despise to hate each other.
Fast forward to 2010, I look at the industry and see the following:
Advances in wireless devices are rapidly outpacing network capabilities. AT&T is starting to limit bandwidth and my personal T-Mobile data plan is insanely slow. It’s becoming increasingly clear that wireless isn’t going to be the great 4th Internet platform (after Cable, DSL/Fiber, and Satellite)
Verizon is deploying Fiber to the household. AT&T is selling television. Comcast is selling phone services. Everyone is in everyone else’s business. It’s clear that the home internet/tv/phone bundle isn’t going away anytime soon.
Comcast is pushing their weight around. They own content now: GE (i.e. NBC). They’ve clearly been lobbying the Obama FCC pretty hard and successfully.
The FCC isn’t really doing anything. I was expecting the Obama FCC to enact all of these great changes in the first 100 days (i.e. net neutrality) but they’ve been basically sitting on their hands.
Now, let’s pretend you own Google. You do content. You deliver content. You make your money selling ads on said content. Comcast is doing all of that too – and they own the pipes. AT&T wants to do that too. The FCC/FTC isn’t doing anything. So, you make an uneasy (and unclear) deal with Verizon – a company that you’ve worked with before (Droid, 2008 Auctions) to scare the FCC into doing something. Sounds pretty reasonable to me.
Of course, my friends tell me that I’m a horrible Google apologist. So maybe some of my friends in the policy world can shed more light on the situation for me. Is Google really trying to be evil? Or are they just trying to spook the FCC into actually doing something. (as an aside: I never would have thought the Bush-appointed Michael Powell FCC would be more proactive in fostering competition)
In an earlier blog post a week ago, I mentioned that there was a possible proof which answered the long-time open CS theory question: does P = NP? The researcher claimed to have shown that P is not equal to NP in a 100 page proof. The proof used complexity concepts that I am just not familiar with (yet), so I am not qualified to judge the proof. However, there are a few researchers around the world who are and it appears that, generally speaking, the proof probably won’t hold up. This isn’t to say that there aren’t valuable findings in the paper, there are, but it doesn’t show that P!=NP.
There are two blogs that have a great coverage of this topic and I suggest reading them:
A few weeks ago, I saw the trailer for the upcoming film “Social Network” – about Mark Zuckerberg and Facebook. You can watch it below:
This morning, ironically on her twitter danah boyd (@zephoria) posted a link to this parody about twitter. Hilarious.
As someone who researches social networks, I’m not sure what I think about the upcoming film. On one hand, publicity for your field of research even in name alone is probably a good thing. On the other hand, it will probably give people the impression that I research facebook or twitter, which I usually don’t. Social networks exist in almost every facet of our lives not just online. I secretly hope the film at least conveys some notion of this and is not just a Facebook plug.
It is August now, which means I start preparing for this year’s fantasy football season. Now that my old UM blog is defunct, I am going to bombard all of my personal blog readers with insights into this year’s fantasy football season. My insights are mostly academic, I’m interested in statistical analysis and developing machine learning algorithms for this setting. Some of my posts will be fun (and or interesting) findings and others will probably be more mathematical. All my data is from NFL.com for the 2001-2009 seasons.
Here’s a fun plot that tells you basically nothing but is cool to look at. You can see some interesting patterns – the spacing between points is because of the point system (teams tends to get multiples of 7s and 3s). That outlier on the lower right is Tom Brady’s 59-0 destruction of Tennessee in 2009.
That looks cool but it doesn’t really tell us much information of use. What about the numbers? The mean score for a home team is 21.91 and for an away team is 19.39. That’s almost a field goal difference. Both home and away scores have the same standard deviation: exactly 10 points. Does this have any insight into your fantasy team? Not really: your players all play the same number of home and away games. Overall, you may get a bump when you have more players at home than away but not necessarily. Interestingly, the scores probably aren’t correlated – so if one team is having a big day does not necessarily correlate to the other having a big day. And conversely, they’re not anti-correlated either. (Correlation coefficient -0.02 p-value=.25).
In case you’re curious, here is what the distribution of scores for home and away teams look like:
Both are pretty skewed to the right and have a strange drop from 10-15 points. This is a little surprising, you would expect many teams to end with a score of 14 but I guess teams will try to squeeze in a field goal or something. I’m not sure about how to explain this.
After my last rant about Facebook’s crappy php-sdk, I gave up on using PHP for facebook applications and switched over to their new Facebook Connect / Javascript SDK. For a multitude of reasons, I strongly dislike javascript so I was a little reluctant. This API is also poorly documented but at least there are more working code samples available online and an example that works with prototype.js, my favorite javascript library. I have to admit, though, I’m impressed with this library. You can do some really cool things all on the client-side now. You can play around with my under development application if you like.
In completely unrelated news, I have fufilled my search to find a good weeknight jazz radio station since Ed Love got booted from the weeknight slot at WDET in Detroit. From 9pm-12, I can pick up WWOZ out of New Orleans, LA which hosts a show called the “Kitchen Sink”. Sometimes they deviate and play non-jazz but it is usually pretty good.
If you’re still interested in the P!=NP coverage, the best source of information is at the Godel’s Lost Letter and P=NP blog. It sounds like the proof probably isn’t going to hold up to scrutiny but it will still lead to some valuable and interesting findings.
Hello Facebook. I remember a hot August night a few years ago when you first launched your apps platform. At the time, it was a pretty revolutionary idea; give developers access to your user’s data and see what kind of fun programs they can hack up. I was so excited, I ordered a pizza and pulled a late night “hack-a-thon” in my Grosse Pointe apartment trying to put something together. I hung out in the #facebook irc chat room with other geeks doing the same thing. Some people were starting ad networks, some people were coding and hacking, and others were just observing and helping. There was even a Facebook employee chilling in the chatroom, helping folks out. It was a renaissance of the geeky kind.
It was also an absolute blast. Your API wasn’t well documented but the code was easy to read and well commented. The wiki you put up was helpful as coders were adding snippets of information. FBML was a little tricky but I picked it up quickly. The API was simple and powerful. The REST concept was pretty easy to understand. Javascript was minimal and your PHP SDK was the best I had ever seen, at that time.
It’s a few years later now and I am tasked with developing a new application, as part of a project I am working on. I think it’s a cool idea that may enhance the value of your platform. I pull up some of my old code, hoping I can make a few changes and run with it. I download the SDK, only to find you’ve deprecated the old platform and moved to a totally new one (graph API). “OK” I think and I start toying around with your new platform – then my requests start timing out (with no notice) and I have to stop developing. I wait a few hours and things are working again; I can continue developing. I notice that you’ve completed altered your authentication mechanism and totally rewritten your PHP SDK.
So, I look up your documentation. Still woefully inadequate and you appear to have replaced the wiki with a forum. As I code, I start to encounter random problems – mysterious error messages, timeout errors, unclear bugs, features I want. I submit them on the forums, only to have a single or no responses. I ask about PHP in the irc channel and am basically told that everyone uses Javascript now, so all my old code and knowledge of the platform is useless. I start clicking around for code samples, only to find you provide 1 sample of php code. I surf to other websites, looking for code samples, but most of those are out of date now. Have you totally abandoned php? If so, just get rid of the SDK.
I’m suggesting this: If it’s not a fully functional SDK, get rid of it. If you prefer one language to another (i.e. Javascript), tell your developers, so we don’t waste our time using a non-fully functional SDK. After two or three years, you could have at least typed up more advanced documentation. At first, it made sense that there was no documentation but by this point, why isn’t there any? Your developers website provides API documentation but does not mention any SDK – why? Are you intentionally trying to scare away developers? Finally, would it be that bad to have a Facebook employee in the IRC channel every once in a while? Just to answer random questions?