pynndescent
pynndescent copied to clipboard
pynndescent is changing the input data, is this expecte?
It seems if the metirc is dot, the input data will be normalized, is this expected? https://github.com/lmcinnes/pynndescent/blob/master/pynndescent/pynndescent_.py#L743
Can confirm–this happens if the metric is dot
and the input data is np.float32
. If it's different dtype, the check_array
call on L684 has already made a copy of the input. This is definitely a bug.
I have fixed the aspect that @jamestwebber pointed out -- we shouldn't alter the users data, that's definitely a bug.
With regard to "dot" as a metric -- it is essentially assuming you really want something like an angular or cosine distance, and, indeed, the actual cosine distance exploits this and uses dot under the hood. Is there a use case for dot product distance measures with non-normalized data? We can always add a different name to handle that.
I have fixed the aspect that @jamestwebber pointed out -- we shouldn't alter the users data, that's definitely a bug.
With regard to "dot" as a metric -- it is essentially assuming you really want something like an angular or cosine distance, and, indeed, the actual cosine distance exploits this and uses dot under the hood. Is there a use case for dot product distance measures with non-normalized data? We can always add a different name to handle that.
I guess there are use cases of dot product with non-normalized data, and the "dot" name itself is not implicating there is a normalization process there.
Is there a use case for dot product distance measures with non-normalized data?
Personally I have no idea, but this is probably a separate issue from the bug that you fixed.
@troilus-canva : That makes sense I guess; could you suggest a naming convention you would like>
@troilus-canva : That makes sense I guess; could you suggest a naming convention you would like>
normalize_dot?
Was a true un-normalized dot product metric ever added? This would be useful in cases where the original embedding was created with this in mind (for example, a matrix factorization).