Get data points of the first layer
Hi, would it be possible to somehow get the list of data points from the first layer of the HNSW graph?
Context:
- I'm storing HNSW graphs on disk in different files
- When a query comes in, I would like to quickly scan the closest points of the graphs without resorting to the full graph file yet.
- Once I have a short list of the candidate points, I will initiate the HNSW based search of top K in each corresponding file
In my algorithm there is a so-called seed point that is used to group the points into future HNSW files, but probably it is not the best one for the efficient search?
Hi @DmitryKey Yes, you can do that via C++ interface. If you want to access it via python that would require adding bindings for that functionality.
Hi @yurymalkov thanks! Can you share a bit more detail on the C++ API part for doing this? I'm operating in Python, but if this would prove possible, could try adding bindings.
Would it be any of the following methods:
linklistsizeint *get_linklist(tableint internal_id, int level) const {
return (linklistsizeint *) (linkLists_[internal_id] + (level - 1) * size_links_per_element_);
};
linklistsizeint *get_linklist_at_level(tableint internal_id, int level) const {
return level == 0 ? get_linklist0(internal_id) : get_linklist(internal_id, level);
};
in hnswalg.h ?
@yurymalkov can you please show me how to recompile the python bindings? Looks like the test case failing cannot see my new method
Would it be any of the following methods:
Yeah. That should work.
can you please show me how to recompile the python bindings?
The bindings can be installed to the python environment by running pip install . in the root of the library. The tests are run by python -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py"
@yurymalkov thanks for the hint.
I have a question -- trying to output graph as text -- how can I access the graph nodes and their edges? Target is to print them as text adjacency matrix.
Basically trying to do this: https://github.com/DmitryKey/hnswlib/commit/615e2a5b528a4cf6ed4eb887b43214d64f2850eb
Getting (with line numbers on the left):
1 0
2 10000
3 0
4 0
5 0
6 0
7 0
8 0
9 10000
10 10000
11 204
12 196
13 132
14 3
15 6342
16 16
17 32
18 16
19 0.360674
20 100
Next follows binary graph, and target is to represent it as text. Sorry for changing the topic of this ticket!
@DmitryKey
You can access them by get_linklist_at_level method e.g.
data = get_linklist_at_level(currObj, level);
int size = getListCount(data); // size is the number of links of currObj at level level
tableint *datal = (tableint *) (data + 1); // datal is the pointer to the array of links
// the size counter and the data reside next to each other in the memory, so it is +1
for (int i = 0; i < size; i++) { // iterating over the links
tableint link = datal[i];
...
}
thanks @yurymalkov gonna try it and let you know how it went