HP Labs report predicting content popularity, & thus revenue

After picking up on HP Labs research on the value of paying attention to top contributors I thought it would be interesting to check previous research from the same guys at Palo Alto, looking at predicting the popularity of online content (pdf) or read it on scribd). The abstract nicely sums it up. Could be useful for planning a community growth strategy for example:

We present a method for accurately predicting the long time popularity of online content from early measurements of user’s access. Using two content sharing portals, Youtube and Digg, we show that by modeling the accrual of views and votes on content offered by these services we can predict the long-term dynamics of individual submissions from initial data.

“In the case of Digg, measuring access to given stories during the first two hours allows us to forecast their popularity 30 days ahead with remarkable accuracy, while downloads of Youtube videos need to be followed for 10 days to attain the same performance. The differing time scales of the predictions are shown to be due to differences in how content is consumed on the two portals: Digg stories quickly become outdated, while Youtube videos are still found long after they are initially submitted to the portal. We show that predictions are more accurate for submissions for which attention decays quickly, whereas predictions for evergreen content will be prone to larger errors.

So let’s get down to the bullet points:

  • There is a linear relationship between the time it takes to consume contributor generated content, and the ability to predict it.
  • There is a clear asymmetrical relationship at work — a few get a lot of attention. A ranking or rating mechanism supports this feature as the ‘rich get richer’.
  • As a side observation only 3% of YouTube views come from incoming links. I assume that includes embedded videos on blogs for example, but that’s not clearly stated.
  • The social network feature of Digg is key as fans get updates of what their favourite folk are reading and follow suit.
  • There is a key difference between Digg and YouTube popularity patterns related to the content context. Digg content is news-related often and soon as such reaches its ‘sell by date’. In contrast on YouTube videos are not promoted to the frontpage in the same way as Digg, and members find the content largely through the search: “An important difference that is apparent in the figure is that while Digg stories saturate fairly quickly (in about
    one day) to their respective reference popularities, Youtube videos keep getting views all throughout their lifetime (at least throughout the data collection period, but it is expected that the trendline continues almost linearly). The rate at which videos keep getting views may naturally differ among videos: less popular videos in the beginning are likely to show a slow pace over longer time scales, too.”
  • Another side observation: it matters what time of day you post content to Digg, if it’s posted in the middle of the night for majority of US readers then this will have an impact. I guess this is a general reminder of making sure content uploads on global communities take account of the various needs of readers, particularly to make sure they are wide awake when content goes up!
  • Digg itself may not be perfect in promoting content. It promotes on average 11% of content which does not generate sustained interest. Guess this means we are all on a learning curve;-)
  • The maths supports the ‘more popular content is early on, the more it will be later on’ rule of thumb:

Popularity measures

  • The researchers provides 3 models to predict submission’s popularity as a time in the future. They favour the constant scaling model (CS) for relative measures, while the linear model (LN) for absolute measures. In conclusion they suggest the error is less using the relative measure.

My reading of their results is that relative measures are particularly useful for judging the revenue value of advertisements placed next to content, but not so good for content popularity. Which is good for YouTube’s advertisers, but bad news in helping Digg correct its 11% error in promoting content which in fact turns out not to be popular!

I guess this also supports my conclusion from the previous post that (1) Have a strategy to support your top contributors. (2) As part of this measurable strategy make sure the means for them to gain attention work well. As I believe my sister company Sift Media does, you can then further tie the attention scalar by tying attention to payment to further reinforce this strategy’s influence.