hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

Get data points of the first layer

Open DmitryKey opened this issue 4 years ago • 9 comments

Hi, would it be possible to somehow get the list of data points from the first layer of the HNSW graph?

Context:

  1. I'm storing HNSW graphs on disk in different files
  2. When a query comes in, I would like to quickly scan the closest points of the graphs without resorting to the full graph file yet.
  3. Once I have a short list of the candidate points, I will initiate the HNSW based search of top K in each corresponding file

In my algorithm there is a so-called seed point that is used to group the points into future HNSW files, but probably it is not the best one for the efficient search?

DmitryKey avatar Oct 06 '21 08:10 DmitryKey

Hi @DmitryKey Yes, you can do that via C++ interface. If you want to access it via python that would require adding bindings for that functionality.

yurymalkov avatar Oct 10 '21 05:10 yurymalkov

Hi @yurymalkov thanks! Can you share a bit more detail on the C++ API part for doing this? I'm operating in Python, but if this would prove possible, could try adding bindings.

DmitryKey avatar Oct 10 '21 15:10 DmitryKey

Would it be any of the following methods:

        linklistsizeint *get_linklist(tableint internal_id, int level) const {
            return (linklistsizeint *) (linkLists_[internal_id] + (level - 1) * size_links_per_element_);
        };

        linklistsizeint *get_linklist_at_level(tableint internal_id, int level) const {
            return level == 0 ? get_linklist0(internal_id) : get_linklist(internal_id, level);
        };

in hnswalg.h ?

DmitryKey avatar Oct 10 '21 17:10 DmitryKey

@yurymalkov can you please show me how to recompile the python bindings? Looks like the test case failing cannot see my new method

DmitryKey avatar Oct 13 '21 12:10 DmitryKey

Would it be any of the following methods:

Yeah. That should work.

can you please show me how to recompile the python bindings?

The bindings can be installed to the python environment by running pip install . in the root of the library. The tests are run by python -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py"

yurymalkov avatar Oct 13 '21 16:10 yurymalkov

@yurymalkov thanks for the hint.

I have a question -- trying to output graph as text -- how can I access the graph nodes and their edges? Target is to print them as text adjacency matrix.

DmitryKey avatar Oct 15 '21 18:10 DmitryKey

Basically trying to do this: https://github.com/DmitryKey/hnswlib/commit/615e2a5b528a4cf6ed4eb887b43214d64f2850eb

Getting (with line numbers on the left):

    1 0
    2 10000
    3 0
    4 0
    5 0
    6 0
    7 0
    8 0
    9 10000
   10 10000
   11 204
   12 196
   13 132
   14 3
   15 6342
   16 16
   17 32
   18 16
   19 0.360674
   20 100

Next follows binary graph, and target is to represent it as text. Sorry for changing the topic of this ticket!

DmitryKey avatar Oct 15 '21 19:10 DmitryKey

@DmitryKey You can access them by get_linklist_at_level method e.g.

data = get_linklist_at_level(currObj, level); 
int size = getListCount(data); // size is the number of links of currObj at level level
tableint *datal = (tableint *) (data + 1); // datal is the pointer to the array of links
                                           // the size counter and the data reside next to each other in the memory, so it is +1
for (int i = 0; i < size; i++) { // iterating over the links
   tableint link = datal[i]; 
   ...
}

yurymalkov avatar Oct 17 '21 04:10 yurymalkov

thanks @yurymalkov gonna try it and let you know how it went

DmitryKey avatar Oct 17 '21 16:10 DmitryKey