hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

valgrind problems when searching a reloaded index

Open jlmelville opened this issue 2 years ago • 3 comments

I recently attempted to update the R bindings for hnswlib but there is a valgrind problem of the form:

==217201== Conditional jump or move depends on uninitialised value(s)
==217201==    at 0x188D9AB5: hnswlib::HierarchicalNSW<float>::searchKnn(void const*, unsigned long) const (packages/tests-vg/RcppHNSW/src/../inst/include/hnswalg.h:1150)

This should also be triggerable via the second Python bindings example on the current README (the code after "An example with updates after serialization/deserialization").

The problem line: https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/hnswalg.h#L1150

I believe the issue is that not all HierarchicalNSW constructors are initializing all of their data members. Specifically num_deleted_ is a problem: while it can be incremented in loadIndex under circumstances where isMarkDeleted returns true, it's never actually assigned. In the example above, there isn't any deletion so the problem doesn't become apparent until searchKnn is invoked.

I haven't checked for the validity of other class member variables so other problems may be lurking. I can certainly help with providing a PR or helping with testing, but I haven't updated the R bindings for a couple of years so more recent functionality isn't tested (yet). So this will for sure need some extra eyes on it.

jlmelville avatar Jul 17 '22 19:07 jlmelville

Hi @jlmelville,

Yeah, it does not seem right. A PR would be awesome, I can review it. Please let me know if there are other options for me to help.

yurymalkov avatar Jul 22 '22 06:07 yurymalkov

I am making a start on this, but leaving some notes here in case I fall victim to the usual combination of too many good intentions and too little free time.

I used clang-tidy to hunt down any issues around non-initialized fields in constructors (ignoring all the modernizing/readability/style checks). It found these issues:

bruteforce.h

https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/bruteforce.h#L11 and https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/bruteforce.h#L14

constructor does not initialize these fields: data_, maxelements_, cur_element_count, size_per_element_, data_size_, dist_func_param_

hnswalg.h

https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/hnswalg.h#L20 and https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/hnswalg.h#L23

constructor does not initialize these fields: max_elements_, cur_element_count, size_data_per_element_, size_links_per_element_, num_deleted_, M_, maxM_, maxM0_, ef_construction_, mult_, revSize_, maxlevel_, visited_list_pool_, enterpoint_node_, size_links_level0_, offsetData_, offsetLevel0_, data_level0_memory_, linkLists_, data_size_, label_offset_, dist_func_param_, metric_distance_computations, metric_hops, ef_

https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/hnswalg.h#L27

 constructor does not initialize these fields: metric_distance_computations, metric_hops

Odd but unrelated things clang-tidy noted

These are not related to the constructor issue so I wouldn't put them in a PR for this issue, but may be worth looking at @yurymalkov:

In https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/hnswalg.h#L387:

parameter 'data_point' is unused

In https://github.com/nmslib/hnswlib/blob/443d667478fddf1e13f2e06b1da4e1ec3a9fe716/hnswlib/bruteforce.h#L26

suspicious exception object created but not thrown; did you mean 'throw runtime_error'?

jlmelville avatar Jul 26 '22 05:07 jlmelville

Thanks for pointing out the problems! I'll fix them after merging #396

yurymalkov avatar Aug 03 '22 05:08 yurymalkov