kdtree
kdtree copied to clipboard
Updating/Avoiding to load the entire tree into RAM
I'm wondering if I could use this implementation to efficiently store OpenAI embedding vectors for my documents. However from what I understand from the documentation, it seems the tree has to be built completely in RAM first, before it can be persisted to the filesystem and then more efficiently be searched. It also seems a already created tree can't be updated/extended.
Would it be feasible to implement updating/extending the tree on disk?
It seems my assumption that the search would be more memory efficient was false. Some data, in case others are interested:
- Items in the Tree: 13375
- Dimensions: 1536
- .bin file size for FKDTree: 157Mb
- Peak Memory during Tree creation: 550MB
- Peak Memory during Nearest Neighbor Search: 510MB
It' s super fast, but unfortunately the memory requirements aren't compatible with most hosted PHP setups.
It's possible to make low memory usage implementation. For this 2 signs should be done.
- Create a disk tree builder. That will not use RAM.
- Remove caching during the search.
Removing caching or constraining its debt is easy, but for tree builder, a new implementation is needed.