hnswlib
hnswlib copied to clipboard
problem with building several graphs in loop if num_threads>1
Hello everyone
I have a problem with big dataset. I have 4 million vectors and try to make 80 graph with 50 thousand elements in each graph in loop using add_items function with num_threads=2(3,4, any number not equal to 1). But after I make 2 or 3 graph, everything stops without errors.
I tried to add 50 thousand elements one by one and it works, but in this case I have another search result, when I try to search on graphs.
How can I fix these problems?
Environment: docker image based on python3.7, 100+ gb RAM, hnswlib version 0.6.2
Thanks in advance
Hi @starminalush,
Can you please share sample code with a dummy data so we could debug?
Hi @yurymalkov, Thanks for answer. I resolved problems that I reported above. The problem was in my data. If I have same embeddings with different labels, graph build incorrect or do not build at all. Can you please add error messages in library? I didn't see my mistakes until I start make sample code with dummy data.
@starminalush What can be the criteria to understand that there is something wrong with the input?
Sorry for long response. This is something wrong with input, if we have several labels with same embeddings. In this case, if we call add_items with num_thread > 1, it doesn't work
Oh. I see. Not sure why it should hang in this case. Duplicates known to reduce the recall and cause some issues, but it should not freeze. Can you please provide a simple example so we can debug?