autofaiss icon indicating copy to clipboard operation
autofaiss copied to clipboard

x8 vs x4fsr

Open OmniscienceAcademy opened this issue 2 years ago • 2 comments

INFO:autofaiss: Computing best hyperparameters for index faiss_titles.faiss 05/05/2022, 07:16:53                                                            
WARNING:autofaiss:The maximum nearest neighbors coverage is 10.65% for this index. It means that when requesting 20 nearest neighbors, the average number of retrieved neighbors will be 2. The program will try to find the best hyperparameters to reach 95% of this max coverage at least, and then will optimize the search time for this target. The index search speed could be higher than the requested max search speed.

What can we do to prevent this?

This happened with "OPQ768_768,IVF262144_HNSW32,PQ768x8" -> bad max coverage With the index_key "OPQ768_768,IVF262144_HNSW32,PQ768x4fsr", everything was ok. The vectors were just a bit too compressed.

My d is 768.

Thank you

OmniscienceAcademy avatar May 08 '22 13:05 OmniscienceAcademy

Hello!

Autofaiss will do a binary search to find the best set of hyperparameters and will set the lower exploration bound given the targeted minimum number of neighbors to retrieve (20 according to the logs).

In your case, it seems that the function estimating the output coverage returned 10.65% for the higher bound of the exploration window (see code: get_nearest_neighbors_coverage). I see several possible explanations:

  • Your index is nearly empty, and you only used the tune_index function on an existing index
  • Your index is not empty but nearly all the vectors you put inside contain null values, making them not retrievable by the search function.
  • You might created your own custom clusters and the first 6144 closest clusters (the max bound for the exploration) are empty most of the time for the first 100s vectors of the index
  • The vector used to compute the coverage (the first 100s in the index) are outliers, maybe some are not searchable because they contain null,

In order to help you more, I would need more context on the commands you used to get these results But for now, you should try to estimate the coverage of your index with the get_nearest_neighbors_coverage function and analyze the output :)

PS: Autofaiss doesn't support the x4fsr variant of these indices As you can see here in the code, the tuning of an "OPQ768_768,IVF262144_HNSW32,PQ768x4fsr" index would raise a NotImplementedError

victor-paltz avatar May 09 '22 10:05 victor-paltz

Thank you for your kind help and this extensive answer.

My index is not empty, my nvectors is correct. The thing which is weird is that I only change the encoding in my script by putting x8 instead of x4fsr. I will continue to investigate on my side.

Yeah, I know that you do not support it. I forked your repo to build it, in order to modify build_index to be able to index any valid index_key even if it means that there is no hyperparameter tuning

OmniscienceAcademy avatar May 10 '22 20:05 OmniscienceAcademy