faiss icon indicating copy to clipboard operation
faiss copied to clipboard

random generator produces the same number sequence with the same seed

Open CaucherWang opened this issue 3 years ago • 7 comments

Summary

In random.h, struct RandomGenerator, random generator produces the same number sequence with the same seed. I found this problem in NSG.cpp, search_on_grapth function.

Platform

Faiss version:e7b9d5514bc41109a0c502bcb6dc72ac196b287a

Reproduction instructions

It is easy to be reproduced.

CaucherWang avatar Sep 22 '22 02:09 CaucherWang

What did you expect?

mdouze avatar Sep 23 '22 15:09 mdouze

What did you expect?

Would it be better to generate random vertex sequences with different queries? A fixed group of "random" vertexes that fill the candidate pool seems a little wired. I'm not sure whether it will do harm to the performance or not.

CaucherWang avatar Sep 24 '22 07:09 CaucherWang

Right, in NSG this is actually a valid concern. @KinglittleQ what do you think?

mdouze avatar Sep 28 '22 12:09 mdouze

The original intent was to make the search process deterministic, which will produce the same search result given the same query. This deterministic does not harm the accuracy according to my evaluation.

I think your concern is reasonable. Maybe we could find a way to generate random sequences depending on the query point.

KinglittleQ avatar Sep 28 '22 13:09 KinglittleQ

Assuming x is the query vector, we could set the random seed to *(unsigned int*)&(sum(x)).

KinglittleQ avatar Sep 28 '22 13:09 KinglittleQ

There is a hash function that could be applied to queries: https://github.com/facebookresearch/faiss/blob/151e3d7be54aec844b6328dc3e7dd0b83fcfa5bc/faiss/utils/utils.h#L168

mdouze avatar Sep 28 '22 13:09 mdouze

Thank you both. Maybe @KinglittleQ can change this line in the next pr, if you think it's more reasonable.

CaucherWang avatar Sep 30 '22 08:09 CaucherWang