hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

Cannot return the results in a contigious 2D array. Probably ef or M is too small

Open prateekpatelsc opened this issue 3 years ago • 3 comments

@yurymalkov : trying to understand when this can happen I have an index where i have few hundred thousand elements , no deletions. my topK is around range 100-500,, search_ef is ~400

Could you please elaborate in what scenarios can the algorithm run into such case ? what is the correct way to handle this increasing M eads to increase in index sizes and slow search times , so i am not too inclined to go this route .

prateekpatelsc avatar Feb 16 '22 02:02 prateekpatelsc

Is this of because stuck in some local minima where no neighbors are improving distance and the search queue is empty ?

prateekpatelsc avatar Feb 16 '22 02:02 prateekpatelsc

Also is this somehow related to data dimensionality as well ? for example for a fixed M and search ef , ef_construction param , the chances of running into these kind of errors are more probable large high dimensional data or similar scale data but lower dimension

prateekpatelsc avatar Feb 16 '22 02:02 prateekpatelsc

Hi @prateekpatelsc,

That might be connected to duplicates in the data. The duplicates are useless for search as each component can be substituted with a single element decreasing the index size and complexity (and one can control them on the client side doing a search before the insertion of a batch) and incremental graph-based approaches can suffer from them due to loss of connectivity in the graph if the size of a maximum duplicate component is much larger than the number of links M (and it is hard to check for duplicates for inner product inside the algorithm).

Can you check if it has large components of duplicates?

yurymalkov avatar Feb 17 '22 05:02 yurymalkov