hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

feature: autoresize

Open louisabraham opened this issue 4 years ago • 2 comments

There should be a feature to handle the index size automatically, making it increase.

Example implementation:

while index.get_current_count() + len(to_add) > index.get_max_elements():
    index.resize_index(2 * index.get_current_count())

On that note, what is the time complexity of a resize?

louisabraham avatar Dec 13 '20 23:12 louisabraham

Agree. That would be nice! Complexity of resize is linear to the size of the dataset. Essentially it is an allocation, a copy and a deallocation.

Maybe it is worth to make a python wrapper over the class to support it. On the other hand, ideally, it should be done in C++. The technical problem with doing it is that the resize is not thread safe with insertion (e.g. some other threads, including python ones, need to finish before copying). This might be a part of larger overhaul of synchronization.

yurymalkov avatar Dec 14 '20 06:12 yurymalkov

Nice for the complexity since autoresize would not change the asymptotic complexity and would just add a small factor (between 2 and 4).

Not sure how to solve the synchronization problem.

louisabraham avatar Dec 14 '20 07:12 louisabraham