Well, since one of the rare intra-rick dispute has been rickdugan’s criticism of yours truly’s inability to solve quantum mechanics, I shall try to provide a simple and useful lion-y formula along with a detailed explanation.
It is hard to do better than the mean of ratings. However, especially trusted and respected reviewers could be given a higher weight. One solution would be Nidan’s: when reviews are approved the adjudicator would click a box to nominate a reviewer for trusted status. Everybody starts with a base weight of one. If a reviewer gets some number (perhaps 5) trusts their weight would initially increase to a value of 2.
This leads to an obvious problem: some reviewers would build up an overwhelming weight. This can be corrected by reducing the increment by some factor every time a reviewer gets another 5 (or whatever cutoff is chosen) trusts their weight increases by 1/(K*times increased), where K is some constant, perhaps 2. Obviously, you would have to substitute 1 for the first increment.
So imagine that a reviewer has 15 trusts and K=2. Then their weight would be:
Score = 1 (the base weight) + 1 (first trust increment) + 0.5 (they’ve been increased once, so 1/(K1) = 0.5) + 0.25 (now they’ve been increased twice, so 1/(K2) = 0.25)
Thus, the overall reviewer weight is 2.75. I leave it to founder and the group to decide on the best value of K. Determining it depends on how often review adjudicators nominate reviewers for “good reviewer” status. This has the desirable feature of allowing reviewers to approve but not nominate.
Obviously, adjudicators would have only one nomination per review. This has two desirable features. First, it prevents any adjudicator from excessively upweighting any reviewers. Second, it will allow active reviewers to achieve higher weights, but only if they impress the adjudicators.
One could, of course, allow adjudicators to approve but register a review as “worth approving, but not very good”. This could then be used to calculate a multiplier with a value <1.
Final score = Score * M
The question, of course, is how to get the multiplier. This lion suggests:
M = 1 / 2^( #downvotes * D)
Where D is some constant, say 0.05. This would mean that one downvote yields
M = 1 / 2^(1*0.05) = 0.966
And 5 downvotes yields:
M = 1 / 2^(5*0.05) = 0.841
You might want to give reviewers a floor of some number such that further downvoting would not further diminish the reviewer’s weight. But this may not be necessary. After all, 100 downvotes would yield:
M = 1 / 2^(100*0.05) = 0.3125
This would be the weight of a reviewer with 100 downvote and no upvotes, and perhaps it should be. Obviously, the values of the constants may need to be tweaked here, and a threshold for the downvotes could be included, i.e.,
<5 downvotes, keep M = 1
= 5, calculate M, but use #downvotes - 4
Note that any reliability metric, no matter how complex can be gamed. This is something TUSCL will have to live with.
Note also that I considered logarithmic functions, but they performed poorly in simulations.
Now this lion will contemplate the notion of pseudocounts. Bayesian approaches, like pseudocounts, have both desirable and undesirable features. Like Fermat writing in a book’s margin, I will leave this for later. I just hope this doesn’t take this lion away from his groundbreaking work on M theory for too long. ROAR!!!