cs-ranking icon indicating copy to clipboard operation
cs-ranking copied to clipboard

"mean of empty slice" in spearman correlation calculation

Open timokau opened this issue 6 years ago • 2 comments

During the tests, numpy complains about a "mean of empty slice". That happens because the calculation of the spearman correlation filters the labels it applies to as follows:

https://github.com/kiudee/cs-ranking/blob/ba03234fb61a4e645b393d2d9ac81c0b85399024/csrank/metrics_np.py#L24

And then averages its results:

https://github.com/kiudee/cs-ranking/blob/ba03234fb61a4e645b393d2d9ac81c0b85399024/csrank/metrics_np.py#L29

Which may be empty (or consist of only NaNs) due to the previous filter. What is the intention behind that filter?

CC @prithagupta

timokau avatar Nov 14 '19 20:11 timokau

The filter is applied to remove instances for which there are ties in the prediction. Ties are problematic in the calculation of Spearman correlation and can cause a non-minor bias. But I also think that the current state of the code could be improved - at the very least the user should get a warning.

Here is a paper discussing several methods on how to deal with ties: https://www.tandfonline.com/doi/full/10.1080/02664763.2015.1043870

kiudee avatar Nov 18 '19 08:11 kiudee

@kiudee @timokau even the script version takes ties into consideration. But we need to check that implementation on how they do it. As far as I remember we removed it because it was not correct or efficient ways of evaluating spearman correlation.

prithagupta avatar Nov 18 '19 11:11 prithagupta