scikit-hubness icon indicating copy to clipboard operation
scikit-hubness copied to clipboard

Enhancement request: be able to do hubness analysis with different metrics

Open ivan-marroquin opened this issue 4 years ago • 5 comments

Hi,

From issue , I learned that the package should be able to conduct hubness analysis with several metrics (including fractional norms).

So, I tried to use a fractional norm with the following code:

from skhubness.data import load_dexter from skhubness import Hubness hub= Hubness(k= 10, return_value= 'all', metric= 'minkowski', algorithm= 'hnsw', algorithm_params= {'p': 0.1}, hubness= 'local_scaling', random_state= 1969, n_jobs= -1) hub.fit(X)

which gave the error below:

Traceback (most recent call last): File "", line 1, in File "C:\Users\IMarroquin\Downloads\Important_Python_Libraries_VisualBuildTools\scikit-hubness-master\skhubness\analysis\estimation.py", line 283, in fit raise ValueError(f"Unknown metric '{metric}'. " ValueError: Unknown metric 'minkowski'. Must be one of ['euclidean', 'cosine', 'precomputed'].

According to documentation of nmslib, this package is able to support several metrics (including fractional norms).

I think it will be beneficial to run hubness analysis with the choice of metric.

Thanks,

Ivan

ivan-marroquin avatar Jun 08 '21 14:06 ivan-marroquin

Here is the link to the issue I mentioned above https://github.com/VarIr/scikit-hubness/issues/67

ivan-marroquin avatar Jun 08 '21 14:06 ivan-marroquin

IIRC, nmslib's HNSW does not support any metric besides Eucl and cos, but please feel free to point me to documentation that states otherwise.

However, this code seems to fail on a check in skhubness that might not be necessary at this point. It would also fail for algorithm="brute" which it shouldn't... Would need to look into this in detail.

For a work-around, you could calculate fractional distances ahead of time, and use metric="precomputed".

VarIr avatar Jun 08 '21 15:06 VarIr

Hi @VarIr ,

Thanks for the prompt answer. With respect the documentation of nmslib on distances: https://github.com/nmslib/nmslib/blob/master/manual/spaces.md

I will try the proposed workaround.

Ivan

ivan-marroquin avatar Jun 08 '21 15:06 ivan-marroquin

Indeed, while optimized indices are only available for Eucl and cos, many more spaces are supported in general.

For personal reference, the detailed list on supported spaces is available in the manual, Table 1, p. 5.

VarIr avatar Jun 09 '21 07:06 VarIr

Thanks for sharing the document

ivan-marroquin avatar Jun 09 '21 16:06 ivan-marroquin