Personal Website for Tom Hayden

Another Fantasy Post: 2011 NFL Season PPG

October 22nd, 2011

I haven’t made a blog post in a while. I’ve been pretty busy. I started working for Facebook in June and it’s been a wild ride. I moved from Chicago to Austin, spent my summer in the SF Bay Area, and finally got settled down for a bit here in Texas.  I was sitting on a flight today and decided to do a quick analysis of 2011 versus previous NFL seasons. I wanted to test the following:

  • Have the rule changes this year impacting scoring in any significant way?
  • What about betting? Has this changed the over/under in any noticeable way this year?
We can quickly plot the points per game and over-unders using data from 2002-2011:
Nothing really stands out here – the variance in 2011 is smaller but we’re only 5 games into the season. The betting over/unders seem pretty much centered around the mean value as we’d expect.  We can do some hypothesis testing and find: there is not a statistically significant increase in points-per-game this year (p-value=0.455).  What about betting? There is possibly a very weak statistically significant increase in the value of over/unders this year (p-value = 0.07). That’s not a super high p-value but given our sample size, not totally convincing but not enough to throw out.
Another interesting plot that I generated – not sure what is really actionable from it but you can see the distribution of points-per-game and over-unders by week.
Not surprising that games earlier in the season tend to have fewer points per game than the middle of the season.

Falling out of the Ivory Tower

May 11th, 2011

This post has been a long time coming and took me quite a while to finally type out. The big news is: I’m leaving my PhD program at Northwestern after 2 years and returning to whence I came – industry. I expect to get some flack for this decision and I feel like a blog announcement might clarify my thought process for myself and others. I will try to avoid being overly sentimental or bitter, although I am a little of both.  Perhaps even making a blog post about this is uncouth, so caveat emptor.

My experiences with the academe were both positive and negative. I absolutely enjoyed every chance that I had to teach and work with students. I was an NSF GK-12 fellow this year and spent a great deal of time in the classroom with high school students. It was a fun and rewarding experience, definitely the highlight of my time here. I enjoyed being a teaching assistant (TA) in my first quarter at Northwestern; working with undergraduates was enlightening and interesting. I loved the environment and collegial nature of my department. My labmates/traveling scholars are absolutely awesome people and I consider them friends outside of academia.

To a certain degree, I also enjoyed the coursework at Northwestern. I wasn’t nearly as prepared as I should have been when I started – I had very little math background. It took me a full year to self-study and catch up to speed such that I could do reasonably well in the courses. The microeconomics sequence at Northwestern was both intense and intellectually fulfilling (except for the parts on strange equilibrium concepts). The computer science graduate courses were also intense and challenging. I think I enjoyed Northwestern’s Graduate Computational Complexity course the best, although I was lost through a large portion of the last 3 weeks (as my class #30 scribe notes may indicate). I walked away from that course feeling accomplishment (even though I got a B).

To me, the sticking point was research. I was never able to find the niche that meet the following necessary conditions for academic success (in my mind)

  1. I would be good at it and happy doing it.
  2. Someone would be willing to work with me.
  3. That I thought was (subjectively) valuable and worthy research.

I worked on a few different research projects and many of them met two thirds of the criteria. In fact, I think I have hit all of the possible combinations at one point or another, except for all three. At this point, the clock has been ticking and I had to start preparing for my qualifier exams (basically a lengthy paper) and my prelims (my pre-dissertation defense). I definitely had some projects that I could have used for my qualifier exams but they only met 2/3 of the conditions above. I couldn’t get motivated to work on something that I wasn’t convinced would lead to a positive outcome for me.

Perhaps I should I have tried to punch through and spend the next 3-4+ years on a project that met 2/3 of the criteria; I’m sure many PhD students do this – you do research that someone else thinks is important, get the PhD, work as a post-doc, find a faculty job, get tenure (after 6 years), and then lead your own research. Or you spend 3-4 years doing research, get the PhD and return to industry with the PhD. And of course, there are many shades of grey along with tons of uncertainty about future faculty positions and the increasing reliance on post-docs.

None of this is meant to pass judgement on others – I know many of you are really passionate and intense about your research and academic futures. I know many of you all are going to go on to do awesome stuff. The passion that you feel about your work in academia is something I was never able to experience; so grab onto it and run with it. There are a lot of people doing really interesting research that is socially relevant to the world we live in.

Northwestern has been generous in giving me time to help find my research “calling” without having to worry too much about finding funding and justifying my work. However, at this point, I feel pressure to either settle down and start working on something dissertation-worthy or to move on. With great hesitation and opportunities elsewhere, I’m choosing the latter option – move on and return to the world of the working.

Overall, I think it has been a rewarding and character building experience. I definitely feel like I reached the point of mathematical maturity that I was seeking to reach. I met some really amazing people and still had free time to do some research that I thought was cool, which you can read on this blog. I’ve enjoyed living in Chicago and spending time on Northwestern’s pretty campus and look forward to the next phase in my life.

Announcing: Hiphop Network Database Project

April 13th, 2011

So, I haven’t posted to the blog in a while because I’ve been super busy working on some cool projects. The coolest of them is a database called the Hiphop Network Database Project. It’s not sitting on an actual domain yet because the site isn’t yet 100% complete, so you’ll be accessing it through that bit.ly link above.

Anyway, it is based on a super cool dataset I collected form several sources (Amazon, Ohhla, and a previously constructed dataset from another researcher). I’m trying to build a network of the entire hiphop scene – who has collaborated with who and when they collaborated. I’ve enlisted the help of some of the students at the school where I teach to help clean out the data. Below is the initial visualization of the data.

The main hypotheses I think I want to test are the following (some of these ideas were suggested by a twitter friend and fellow researcher here at Northwestern)

  1. Hip-hop networks have a greater amount of assortativity.
  2. Hip-hop network data has interesting dynamic assortativity. This isn’t really a hypothesis because I’m not sure how to formulate the concept of dynamic assortativity into something testable yet. So for now, we’ll just say it’s “interesting”.
  3. Hip-hop collaborations have changed our language. I think we might be able to do some kind of clustering around specific words and see how they influence artists – is there some kind of “shizzle” contagion?
  4. There are communities of hiphop artists formed around certain attributes {money, prison, studios, certain producers, cities}. In the case of money, this is almost certainly true but I’m curious to know if there is any kind of geographic effect. I had a student that is very familiar with hiphop suggest that it has no effect – it’s all about money.

I’d post the data to this blog for people to download but there is an entire website devoted to it now, so you can go visit the Hiphop Network Database Project.

Sampling with Sparse and Biased Location-based Data

March 12th, 2011

There’s an article over on Mashable about using data from location-based sites like Foursquare.com to provide estimates of how busy (or not busy) places are on certain days. You can read about the specific site,When Should I visit? Here’s a sample of what they’re talking about, which I stole from the Mashable article:

A few of the commenters rightfully noted that this is based solely off of data from foursquare and does not consider the vast majority of visits to the Hall that are not logged on social networking site.  This is sparse data – we have a sample of people at this location but it is maybe only a small fraction (less than 5%?) of overall visits.  I claim, providing basic summary statistics (like above) is not really informative for a few reasons:

  • Sample Size: What if the sample size is very small? If 12 people have checked into this venue, we might see something like the above graph. However, this basically means nothing. Our sample size is too small.  Here’s an example I found on the site, where our sample size is probably 2.  You don’t need to be statsistician to know that this tells you nothing.  They can probably remedy this by setting some threshold and using a sample size calculator.
  • Bias/Autocorrelation: I am guessing that the average foursquare.com user fits a tight demographic: probably mostly male upper-middle class (sub)urban white folks. This would imply that there is probably some correlation between users. For example, if we both have 9-5 jobs, our behavior after 5pm may be correlated – bars on Friday nights, concerts on the weekends, etc. Even if this site used more robust statistical techniques, it would be estimating the population of young (sub)urban white folks at venues. I’m not totally sure how they’d be able to statistically account for this yet. I’m sure there is probably some kind of sampling technique they can use, given that they know the demographics of foursquare.

Either way, it’s still a cool idea. I can see places where this would be really valuable, like at amusement parks and museums, where the quality of your experience is inversely correlated to the number of other people there with the same preferences as you.

Google, to some degree, uses this technique for their traffic prediction algorithms. They look at where the android users are, how fast they are going, and can generally use this to sample traffic conditions. As an aside, I’m still not sure how they can tell when people are walking or driving. If you know this, please post a comment.

Best College Basketball Players

March 12th, 2011
In an earlier post, I argued that for basketball, I like the per minute efficiency statistic as a way of assessing player contribution. For one, it includes defensive stats, so it is a bit more robust than a simple offensive measure.  Second, when comparing against other players in the league, it is a pretty good comparison metric since the distribution of efficiency is approximately normal (with a bit of a skew and a slightly longer tail on the high end)

Fortunately, with college players, our data set is pretty large. I threshold all the players with fewer than 250 minutes and we’re left with 2,855 players (for 2011 data only).  We can look at the top players. Pay attention to the column on the far right, this is the average efficiency per minute the guy is on the court.

                              pid minutes team efficiency       avgeff
1470          MORE-Kenneth Faried    1142 MORE        930  0.814360771
3150             BUF-Javon McCrea     661  BUF        499  0.754916793
4280          NCAT-Thomas Coleman     989 NCAT        740  0.748230536
926           KAN-Thomas Robinson     415  KAN        308  0.742168675
3019             OAK-Keith Benson    1094  OAK        800  0.731261426
923           KAN-Markieff Morris     796  KAN        581  0.729899497
1934          RICE-Arsalan Kazemi     954 RICE        680  0.712788260
4131        ARIZ-Derrick Williams     975 ARIZ        687  0.704615385
901           IONA-Michael Glover     542 IONA        380  0.701107011
1098            NORF-Kyle O`Quinn    1050 NORF        731  0.696190476
1761            ODU-Frank Hassell     963  ODU        659  0.684319834
2550             TXST-Tony Bishop     811 TXST        553  0.681874229

I don’t really know much about college basketball players (or the teams these guys play for) but if I was an NBA team, I would certainly be recruiting these guys pretty hard.  If you’re an NBA team reading this (or not), you can download my 2011 efficiency data file: 2011_ncaabb_efficiency.csv (121KB)

I can look at the players on my beloved MSU Spartans to find their efficiency values. Yuck.

                      pid minutes team efficiency    avgeff
1382 MICS-Durrell Summers     931 MICS        176 0.1890440
1383     MICS-Kalin Lucas    1062 MICS        336 0.3163842
1384  MICS-Draymond Green     956 MICS        533 0.5575314
1385      MICS-Delvon Roe     756 MICS        334 0.4417989
1386 MICS-Garrick Sherman     396 MICS        156 0.3939394
1387   MICS-Keith Appling     726 MICS        182 0.2506887
1388 MICS-Austin Thornton     360 MICS         76 0.2111111
1390   MICS-Adreian Payne     291 MICS        125 0.4295533
4837   MICS-Korie Lucious     440 MICS         53 0.1204545
4838     MICS-Mike Kebler     301 MICS         76 0.2524917

Of course, making a comparison between a player on Michigan State and someone at Morehouse State isn’t completely fair. The level of competition between the two teams varies widely since the schedules may be completely disjoint. For example, Michigan State plays a lot of games against other teams in the Big 10. I don’t even know what division Morehouse State is in but I can guess that the level of competition is much lower than the Big 10.  So ultimately, for this statistic to be valuable, we need to normalize it by the level of competition and use something like the SRS or SOS stat per team. Perhaps I’ll update with this data later, after the tournament or you can do it yourself using the data file above.

NCAA Season Data

March 10th, 2011

For those of you interested in generating an algorithm to predict 2011 NCAA Tournament brackets. I’ve compiled a data file with all games played by most NCAA teams in the 2011 season. The file is available: 2011_season.csv (259kb)

A few early (and interesting) metrics from the file. Home court advantage is roughly 7 points in the NCAA and the average number of points per game is 68.69. This is unlike the NBA where the average points per game is roughly 100 with a 3 point home court advantage. Generally, home court advantage is a big deal in college hoops. Fortunately, for most games in the tournament, this factor is neutralized (although, it didn’t help my team, the Spartans when they played the NCAA championship game in Detroit – ostensibly home court advantage). We can generally see that the distribution of points is pretty normal, as to be expected:

That’s all I have for now. I’m competing in a NCAA competition, where we have to fill out the brackets using some computer generated method. I’ll post my bracket when it’s ready!

Hip Hop Word Usage is Power Law!

January 30th, 2011

A commenter on my previous post who is also one of the Northwestern GK-12 Fellows and a physicist pointed out the error of my ways in the post. In trying to show that the use of words in hip-hop is power law, I only plotted one of the axes of my graph using a log scale; not realizing that I need to do the other one as well. This is what resulted in my confusion over having discovered some ultra-power law thing.  Thanks Dan for your response, this was kind of bothering me this afternoon.

After loading the data and replotting using the correct axes, we get:

The linear looking fit indicates that it probably does follow a power law distribution.  Laszlo Barabasi, Dr. Dre, and myself can sleep peacefully tonight having made this discovery. To my non-science friends, I should explain a little bit about why this matters.  You may have heard of the 80/20 rule, where 80% of the work is done by 20% of the people. This is something that scientists observe by looking at real data, all the time (and we can see it by looking at the “log” plot of the data, as I did above).  Scientists observe this in data such as:

  • Your # of friends on a social networking site. Some people have an enormous # of friends, whereas most people have a much smaller number.  The hip-hop collaboration network I posted months ago exhibited this behavior.
  • Internet topology – there are a few routers handling most of the traffic and many not handling much at all. This is from Laszlo Barabasi’s book Linked.
  • Natural languages generally are known to exhibit power law behavior. (I think) this is usually referred to as Zipf’s Law.  This is what makes the question we’re asking here interesting – does the language of hip-hop lyrics conform to this law? (the answer is most likely yes)

To celebrate this discovery, I am making the data file available: hiphop_words.txt (2748Kb) with a few caveats:

  • The wordlist is scraped from the hip hop lyrics website ohhla.com, which uses user contributed content. There is a ton of misspellings, incorrect words, and other assorted randomness in the file. I tried my best to clean out the random noise but there’s a lot of it.
  • The wordlist is full of obscenities. Download at your own risk.  I am not responsible for this. I gaurantee this is the most obscene data set that you will ever use.
  • The format is “word,# of times used”.
  • If a word is used more than once in a song, I count it the number of times it is used. For example, the word “whoomp” is used 25 times – probably solely from the single song “Whoomp There it is” (watch the song on Youtube)

I was trying to think of a genuine good use of the word list. The only thing that came to mind was password / penetration testing for computer hackers/crackers. Usually, they use a wordlist of some kind to test for common passwords and the wordlist is generally from the dictionary or some corpus of text. This is just another corpus and wordlist for them to try.

More NBA Stats: Efficiency

January 14th, 2011

I was trying to find a good metric to use to judge NBA player performance. Something that captured a lot of different game results, like OBP+SLG or VORP in baseball. There really aren’t many good metrics out there, so I decided to play around with a stat called Efficiency.  It’s definitely not the most comprehensive stat since it fails to account for mins on the court, so I’m going to refine this a bit to Avg Efficiency / Min. The equation is:

[((Points + Rebounds + Assists + Steals + Blocks)
 - ((FG Att. -  FG Made) + (FT Att. - FT Made) + Turnovers))] \ mins

Therefore, a good player should have a higher efficiency per minute whereas a weaker player should have a lower value.  So far, this season the mean # of mins played is 536, so let’s only look at players with more minutes on the court than the mean.   We can see our top 10 players:

id                   name      mins team       eff   avg_eff
50             kevin_love 1672.9667 MIN       1265 0.7561418
178         carlos_boozer  740.7667 CHI        535 0.7222247
481         dwight_howard 1471.2333 ORL       1055 0.7170854
12       amare_stoudemire 1572.7500 NYK       1111 0.7064060
67              pau_gasol 1739.4000 LAL       1217 0.6996666
230         blake_griffin 1555.6500 LAC       1086 0.6981005
438            tim_duncan 1239.1833 SAS        858 0.6923915
392            al_horford 1516.0333 ATL       1017 0.6708296
296         kevin_garnett 1073.1000 BOS        719 0.6700214
114          lebron_james 1665.6500 MIA       1091 0.6549995

As we can see, player average efficiency is distributed pretty normally with a mean of 0.40 and standard deviation of 0.12.  This should not be really surprising.

What about overall team efficiency? We would expect that having a higher overall team average efficiency should lead to a better record. We can plot this:

This is also somewhat of an unsurprising result – teams with higher overall efficiency tend to win more games (at least in 2011). It seems to be a strong linear correction (0.75 p-value 0.000002) between efficiency and the number of wins – losses.  There are a few interesting outliers: Milwaukee (MIL) and Minnesoda (MIN).

You can download the data here:

Player Efficiency (16KB) | Team Efficiency (1KB)

Tim Donaghy Corrections & Updated Analysis – Still Negative Finding

January 3rd, 2011

This morning, I received an email from Professor Sean Griffin, who is authoring a book and runs a blog about the NBA betting scandal I blogged about last weekend.  He informed that the assumptions I made in my analysis were wrong!  I’m always happy to hear that anyone reads my blog, so thanks for emailing me Sean!

Here’s a few things that I had wrong.

  • There exists proof from the FBI that Donaghy was betting on games all the way back to the 03-04 NBA season. My source (wikipedia) claimed it was only the 2005-06, 2006-07 seasons.  So my analysis (and RJ Bell’s?) of comparing his older games to his supposed “fixed” games is moot.  In fact, he bet on way more games that I had originally assumed (somewhere over 120!)
  • There does not exist definitive proof that he was fixing the games.  The NBA and Donaghy himself claim that he was still an impartial judge of the games.  I abused language a little bit. Sean Griffin’s blog has better and more comprehensive treatment of this matter.
  • Donaghy was not betting on the over/under spread as Wikipedia claimed. Way to go Wikipedia.  He was in fact betting on sides.  This is what I’m going to examine below.

Side Betting

First, let’s at least assume the Wikipedia page on spread betting is correct. So, betting on sides works like this. Before the game, a bookmaker sets a “spread” on the game.  The spread is like a handicap in golf, for example. If the game is Detroit v. Chicago, and the spread is -13 for Detroit, then they are the favorite by 13 points and you win if the following is true:

(# points for Detroit) - 13 > (# of points for Chicago)

Likewise, you can also bet on the underdog. In which case, you win the bet if the following inequality is true:

(# points for Chicago) + 13 > (# of points for Detroit)

Disclaimer: I am not an expert on sports betting – so if you’re one of my readers and I’m wrong, submit a comment or send me an email.  I did some preliminary analysis of how the spreads are set and it’s a very technical and incredibly accurate in aggregate. Over the whole season the bookmakers achieve and almost perfect 50/50 split in probabilities.

Analysis

We make the following assumptions.

  • We don’t know which games Donaghy placed bets on. We know the overall quantity is somewhere greater than 120.
  • Which lines that Donaghy bet – i.e. did he take the favorite or the underdog? If so, by how much?
  • We don’t know if his betting impacted the result of the game.

Here is my claim: If Donaghy was influencing the results of the game, then it is plausible that over the course of a game, he may influence the score in such a way that the result will certainly be either on one side (the underdog) or the other side (the favorite).  For example, if we have our game from above where the spread is -13 in favor of Detroit, then perhaps a ref who bet on the game may give Detroit as many opportunities to score as possible to ensure they beat the spread (i.e. they reach 20 points above Chicago).  This is in opposition to a referee who may have no preferences for a specific team winning and will have a smaller difference.

We can compute this difference:

Spread_Difference = | (Actual Difference) - (Betting Spread) |
Spread_Difference = | (away score - home score) - (spread) |

Using our example, if Detroit wins by 20, then the spread is computed:

Spread_Difference = | (-20) - (-13) | = 7

Likewise, if Chicago won by 20, then the difference would be computed:

Spread_Difference = | (20) - (-13) | = 32

In both cases, a lower number indicates the game was closer to the bookmaker predictions, whereas a higher number indicates it was further away.  Now, let’s compare three stats: The games Donaghy refereed, the games Donaghy was the lead referee, and every other game in the set of games from 2002-2010.

We can see in the above boxplot (and from the stats) the following facts:

  • The medians for all three sets of data are approximately the same.
  • The mean difference for games where Donaghy was the lead official is less than the mean for games when he wasn’t.
  • There is no statistically significant difference in the means of the games where he was not the lead official (Donaghy in the plot above) and all games by all referees. (p-value = 0.5478)
  • There is a weakly significant difference in the games where he was the lead official versus all other games (the 2nd and 3rd boxes above).  Let’s just consider those two boxes.

We hypothesize the following: If Donaghy was in fact, not swaying games then it should be true that there is no statistically significant difference in the spread differences between games.

Result: Like our previous analysis, we end up with an inconclusive result. There is some degree of statistical significance between the two sets (p-value=0.04223) but it is very weak and furthermore, opposite of what we would have expected (that there would be a higher mean!)  If he was swaying the outcomes of games, we certainly cannot tell by looking at this metric, either.

This result just confirms that the post on Wikipedia was non-sense.  Any claims that there is definitive statistical proof by looking at the results of games and comparing them to the spreads (either over/under OR sides) is probably not true.  If you want to try for yourself: Download the data here! (1 MB)

NBA Referee Analysis: The Refeffect

December 31st, 2010

With all this talk about referees – we have to ask the question: Is every referee the same? Can we predict some of the variance in scores based on the referee of the game?

The answer to this question is: yes, sometimes.  We can do a linear regression and ask: To what degree is the total number of points dependent on the 1st official in the game (we’re ignoring the 2nd and 3rd officials for this one). Included is a table below of the referees that have a statistically significant impact on the total score of the game.  The Intercept value of 189.8724 is the starting number of total points in a game and the +/- points is the increase or decrease a referee gives a game compared to that number.

 
(Intercept)              189.8724   
Referee Name        +/- Pts    Error    p-value
Bennett_Salvatore   6.1311     1.1849   2.32e-07 ***
Bill_Kennedy        9.5844     2.4492   9.16e-05 ***
Bill_Spooner        8.4406     1.4574   7.15e-09 ***
Bob_Delaney         3.7420     1.2122   0.00203 **
Dan_Crawford        6.5557     1.1957   4.27e-08 ***
Derrick_Stafford    6.1384     1.3461   5.16e-06 ***
Dick_Bavetta        5.2878     1.1888   8.74e-06 ***
Eddie_F._Rush       4.9395     1.2174   4.99e-05 ***
Greg_Willard        7.3311     1.3187   2.76e-08 ***
Jim_Clark           7.4736     1.3496   3.13e-08 ***
Joe_Crawford        4.0916     1.1805   0.00053 ***
Joe_DeRosa          7.4828     1.2193   8.68e-10 ***
Joe_Forte           7.9856     1.3543   3.81e-09 ***
Ken_Mauer           5.6227     1.3065   1.69e-05 ***
Marc_Davis         10.4655     2.5504   4.10e-05 ***
Mark_Wunderlich     6.5567     1.4661   7.81e-06 ***
Mike_Callahan       6.3255     1.2578   5.00e-07 ***
Monty_McCutchen     6.4767     1.3900   3.20e-06 ***
Olandis_Poole      30.1276     9.3430   0.00126 **
Rodney_Mott        16.4353     5.8287   0.00481 **
Ron_Garretson       3.1622     1.1680   2.00679 **
Scott_Foster        9.9592     1.4626   1.03e-11 ***
Tom_Washington      8.2859     1.4218   5.76e-09 ***

As we can see, some referees definitely have a larger impact on the total score than others.  Scott Foster, for example averages about 200 total points per game with a standard error of about 1.5, which is reasonably small.   Another ref, Olandis Poole has a much larger effect but his error is also much larger about 10 points.

Of course, this should be taken with a grain of salt, the fit (r^2 measure) is very small (around .02). When we consider the over/under spread – i.e. what bettors think the total score will be, our r^2 increases to 0.30 and we have only a few statistically significant referees. These are the guys that statistically have a greater likelihood of allowing more points than the over/under spread:

(Intercept)                4.55755    2.68269   0.08937 .  
Spread Effect              0.97518    0.01366  < 2e-16 ***
Referee Name        +/- Pts    Error     p-value
Jim_Clark           3.61094    1.13257   0.00143 **
Joe_DeRosa          2.43230    1.02457   0.01761 *  
Joe_Forte           2.99994    1.13739   0.00836 **    
Matt_Boland       -35.98984   17.46057   0.03930 *    
Olandis_Poole      19.23714    7.83338   0.01407 *  
Pat_Fraher         16.69456    7.83198   0.03306 *    
Scott_Foster        2.44065    1.23054   0.04735 *    
Ted_Bernhardt       4.55581    1.61288   0.00474 **

After sifting through some blogs, it’s pretty apparent that Scott Foster has a reputation for this effect already, confirming some of these results.  As for the other referees, I don’t claim to be an expert and I’ve never bet on a game, so I have no idea if anyone knows about the effect of these guys.

Links

My Blog - I finally gave in and created a blog where I can post about whatever I like.

hiphop-networks.com - A website that I built while I was part of a fellowship at Northwestern. The original idea was to use the site to teach social networks to kids and it sort of spawned into it's own awesome probject.

Fun SI Projects

Using Bidding Networks to Search for Exposure in Auctions - Auction 73 Case - This is some work I did in Fall 2008, as a final project for my Networks course at SI. I'm currently trying to see if this is publishable.

Technological Diffusion with Compatibility - This is based off of a model presented at one of Umichigan's STIET lectures this year.