hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

ef tuning

Open prateekpatelsc opened this issue 3 years ago • 4 comments

@yurymalkov : on the readme it is mentioned that ef during search should be greater than K. Is this strict requirement ? I have tested some datasets where topK is ~1k and ef is set to 100 and am able to get results , but wondering what sort of issues can i run into , making ef large is causing slowness in search as expected . Any other recommendations to improve recall while not taking a big latency hit due to ef . Also for a given ef_construction , changing M is not improving recall at all , in fact recall is very low < 20% , any thoughts on what can i try to improve recall , on documentation i read topM can give. recall of 90% , if not keep increase ef_construction , but this didnt help

prateekpatelsc avatar Feb 11 '22 19:02 prateekpatelsc

This ef>=K is enforced inside the bindings, it increases ef if it is set too low. Otherwise there is no guarantee that it will actually find K elements with the used stop condition.

For the second question - do you get bad recall regardless of ef?

yurymalkov avatar Feb 11 '22 22:02 yurymalkov

Thanks for clarification . So this implies that searching for a larger topK (top 1000) and then taking the smallest top100 among them is a different result than searching for top 100 directly . The first search with larger topK may produce more accurate top100 .

Regarding second yes : I tried increasing ef_construction while keeping M constant and followed the recommendation that "there is room for improvement if your recall is < 0.9 on searching top M in the index) . But i dont really see improvement it saturates.

prateekpatelsc avatar Feb 11 '22 22:02 prateekpatelsc

Yes, that is how it works internally.

The second part I do not fully understand. If M and ef_construction are set, does the recall goes up to, say, 0.9 with the increase of ef?

yurymalkov avatar Feb 11 '22 22:02 yurymalkov

What i was trying was : build index with ef_constuction and fixed M , not if topM with ef=ef_consruction recall <0.9 , then increase ef_construction and build new index . I thought this is what the readme documentation meant to build better quality index ?

For increasing ef with fixed index (m, ef_construction) , it does increase , but isnt it expected since you will eventually end up searching the entire graph

prateekpatelsc avatar Feb 11 '22 23:02 prateekpatelsc