Bayesian methods in ranking

2013-06-25 01:37:10

Many websites use a ranking system to determine the display order of content, especially for Q&A or news websites like Quora, Reddit, zhihu, that for a post/question, there might be multiple comments/answers, and the users could use the upvote or downvote buton to gradually change the displaying order.

From the simplest perspective, the number of upvote and down vote should decide the ranking of the response/answer. (There may be some extra factors needed to consider for ranking though.)

In this case, a Bayesian approach could be useful. Let us assume that the ranking is purely decided by the number of upvotes, while the number of upvotes is a Binomial distribution with n total users and voting probability p.

Now the question is: how to decide p appropriately? The Bayesian answer would be to calculate the conjugate prior probability and then update with data.

Before going further, let us take a look at the conjugate prior distribution diagram:

From this we know that the conjugate prior probability of p follows Beta distribution, $\mbox{Beta}(a,b)$, which pdf is proportional to the $p^{(a-1)}$ and $(1-p)^{(b-1)}$. And to update p with data we calculate the posterior probability:

$$p|\mbox{data} \sim \mbox{Beta}(a+upvotes, b+downvotes)$$

This simple method could basically show that:

  • more upvotes would high-rank the answer
  • more downvotes would low-rank the answer
  • the ranking would gradually change rather than suddenly jump to top or bottom.