The method uses no other input than the document scores of a standard retrieval run, fit a mixture of (possibly truncated) normal and exponential distributions (normal for relevant, and exponential for non-relevant document scores), and calculate the optimal score threshold given the estimated distributions and their contributing weight. The experiments confirm that the s-d method is effective for determining thresholds, although there is still clear room for improvement: the effectiveness varies considerably per topic, with an average performance of 75-80% of .
Assuming that a normal-exponential mixture is a good approximation for score distributions and that no relevance information is available, we believe that the improved methods described in this paper are a) as general as possible, b) they deal with most known theoretical anomalies and practical difficulties, and consequently, c) they bring us closer to the performance ceiling of s-d thresholding. If the effectiveness is deemed unsatisfactory, further improvements of s-d thresholding should come from using alternative mixtures or training data. Nevertheless, some other mixtures may be more difficult--or even impossible--to estimate.
This research was supported by the Netherlands Organization for Scientific Research (NWO, grant # 612.066.513, 639.072.601, and 640.001.501).
avi (dot) arampatzis (at) gmail