The method
uses no other input than the document scores of a standard
retrieval run, fit a mixture of (possibly truncated) normal and exponential distributions
(normal for relevant, and exponential for non-relevant document scores), and
calculate the optimal score threshold given the estimated
distributions and their contributing weight.
The experiments confirm that the s-d method is effective for
determining thresholds, although there is still clear room for
improvement: the effectiveness varies considerably per topic,
with an average performance of 75-80% of
.
Assuming that a normal-exponential mixture is a good approximation for score distributions and that no relevance information is available, we believe that the improved methods described in this paper are a) as general as possible, b) they deal with most known theoretical anomalies and practical difficulties, and consequently, c) they bring us closer to the performance ceiling of s-d thresholding. If the effectiveness is deemed unsatisfactory, further improvements of s-d thresholding should come from using alternative mixtures or training data. Nevertheless, some other mixtures may be more difficult--or even impossible--to estimate.
This research was supported by the Netherlands Organization for Scientific Research (NWO, grant # 612.066.513, 639.072.601, and 640.001.501).
avi (dot) arampatzis (at) gmail