hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

problem with building several graphs in loop if num_threads>1

Open starminalush opened this issue 2 years ago • 5 comments

Hello everyone

I have a problem with big dataset. I have 4 million vectors and try to make 80 graph with 50 thousand elements in each graph in loop using add_items function with num_threads=2(3,4, any number not equal to 1). But after I make 2 or 3 graph, everything stops without errors.

I tried to add 50 thousand elements one by one and it works, but in this case I have another search result, when I try to search on graphs.

How can I fix these problems?

Environment: docker image based on python3.7, 100+ gb RAM, hnswlib version 0.6.2

Thanks in advance

starminalush avatar Jul 01 '22 11:07 starminalush

Hi @starminalush,

Can you please share sample code with a dummy data so we could debug?

yurymalkov avatar Jul 05 '22 19:07 yurymalkov

Hi @yurymalkov, Thanks for answer. I resolved problems that I reported above. The problem was in my data. If I have same embeddings with different labels, graph build incorrect or do not build at all. Can you please add error messages in library? I didn't see my mistakes until I start make sample code with dummy data.

starminalush avatar Jul 13 '22 12:07 starminalush

@starminalush What can be the criteria to understand that there is something wrong with the input?

yurymalkov avatar Jul 14 '22 04:07 yurymalkov

Sorry for long response. This is something wrong with input, if we have several labels with same embeddings. In this case, if we call add_items with num_thread > 1, it doesn't work

starminalush avatar Jul 18 '22 19:07 starminalush

Oh. I see. Not sure why it should hang in this case. Duplicates known to reduce the recall and cause some issues, but it should not freeze. Can you please provide a simple example so we can debug?

yurymalkov avatar Jul 24 '22 00:07 yurymalkov