annoy icon indicating copy to clipboard operation
annoy copied to clipboard

what data is stored in leaf node and split node ?

Open AjitAntony opened this issue 4 years ago • 0 comments

Hi ,

Im trying to understand the working of annoy and have read the code _make_tree since im not from C++ background im trying hard to figure out the logic of whats stored in leaf node and split node ,you blog was very insightful but i could understand below to points

the question may sound silly but if you could share these details it would be really helpful , we feed only the vectors and its id (assume index number) to annoy

  1. what technique is used to compressed vectors in annoy to represent in a smaller dimensions?

  2. what is stored in leaf node ?

nodes with n_descendants == 1 the vector is a data point

https://github.com/erikbern/ann-presentation all nodes are having 2 descendants ,can you guide and tell which nodes has n_descendants == 1 in this mnist tree diagram shown in your repo .

https://erikbern.com/assets/2015/09/tree-2-graphviz1-300x203.png in above blog you showed a sample small tree ,Assume if its that final tree dose 60 in leaf node represent the vector of 60 points( dose points represents the index of that particular vector or vector compressed into smaller dimension) or 60 original dimension vectors of the respective 60 points in orange hyperplane ?

nodes with n_descendants > K the vector is the normal of the split plane. what dose "the vector is the normal of the split plane" mean im unable find its meaning in other blogs/google

3.what is stored in split node/root node ? dose it store the list of all the child node/child points below its hierarchy or dose it store a vector of for example say mean of all the vector points under it ? in the sample tree image split node represented in dark blue color store the index of 255(157+98=255)points ?

4.During search if we feed a vector how this vector traverse to the leaf node in tree ? do this vector do a similarity check with all the split nodes? if its assumed that split node represent mean of vectors under it ?

AjitAntony avatar Jan 06 '21 19:01 AjitAntony