kdtree icon indicating copy to clipboard operation
kdtree copied to clipboard

Updating/Avoiding to load the entire tree into RAM

Open splitbrain opened this issue 2 years ago • 3 comments

I'm wondering if I could use this implementation to efficiently store OpenAI embedding vectors for my documents. However from what I understand from the documentation, it seems the tree has to be built completely in RAM first, before it can be persisted to the filesystem and then more efficiently be searched. It also seems a already created tree can't be updated/extended.

Would it be feasible to implement updating/extending the tree on disk?

splitbrain avatar Jun 08 '23 07:06 splitbrain

It seems my assumption that the search would be more memory efficient was false. Some data, in case others are interested:

  • Items in the Tree: 13375
  • Dimensions: 1536
  • .bin file size for FKDTree: 157Mb
  • Peak Memory during Tree creation: 550MB
  • Peak Memory during Nearest Neighbor Search: 510MB

It' s super fast, but unfortunately the memory requirements aren't compatible with most hosted PHP setups.

splitbrain avatar Jun 10 '23 09:06 splitbrain

It's possible to make low memory usage implementation. For this 2 signs should be done.

  1. Create a disk tree builder. That will not use RAM.
  2. Remove caching during the search.

hexogen avatar Jun 10 '23 18:06 hexogen

Removing caching or constraining its debt is easy, but for tree builder, a new implementation is needed.

hexogen avatar Jun 10 '23 18:06 hexogen