hnswlib
hnswlib copied to clipboard
ef tuning
@yurymalkov : on the readme it is mentioned that ef during search should be greater than K. Is this strict requirement ? I have tested some datasets where topK is ~1k and ef is set to 100 and am able to get results , but wondering what sort of issues can i run into , making ef large is causing slowness in search as expected . Any other recommendations to improve recall while not taking a big latency hit due to ef . Also for a given ef_construction , changing M is not improving recall at all , in fact recall is very low < 20% , any thoughts on what can i try to improve recall , on documentation i read topM can give. recall of 90% , if not keep increase ef_construction , but this didnt help
This ef
>=K
is enforced inside the bindings, it increases ef if it is set too low. Otherwise there is no guarantee that it will actually find K elements with the used stop condition.
For the second question - do you get bad recall regardless of ef
?
Thanks for clarification . So this implies that searching for a larger topK (top 1000) and then taking the smallest top100 among them is a different result than searching for top 100 directly . The first search with larger topK may produce more accurate top100 .
Regarding second yes : I tried increasing ef_construction while keeping M constant and followed the recommendation that "there is room for improvement if your recall is < 0.9 on searching top M in the index) . But i dont really see improvement it saturates.
Yes, that is how it works internally.
The second part I do not fully understand. If M
and ef_construction
are set, does the recall goes up to, say, 0.9 with the increase of ef
?
What i was trying was : build index with ef_constuction and fixed M , not if topM with ef=ef_consruction recall <0.9 , then increase ef_construction and build new index . I thought this is what the readme documentation meant to build better quality index ?
For increasing ef with fixed index (m, ef_construction) , it does increase , but isnt it expected since you will eventually end up searching the entire graph