R2R
R2R copied to clipboard
Feature/ LanceDB integration
Hi,
I have added the code for LanceDB vector store class.
supports:
- init_collection
- upsert/bulk upsert
- filtered search ( have added metadata filtering support as well but we should decide on a schema for main PR )
- filtered deletion
- unique values in metadata
TODO : Code cleanup/minor fixes pending along with final schema. Add docs wherever necessary.
Testing :
I tested via adding an example, added configs/local_ollama_lancedb.json and tested via run_test_client.py.
facing an issue in bulk insert have commented the code in r2r/main/app.py which I think is causing the issue, storage happens in embedding.py via upsert_entries, but it is being called sequentially so bulk insert isnt happening, and the run() method doesnt accept list() it accepts a single DocumentPage class.
If I am doing something wrong here please let me know, otherwise would need to request some changes in the r2r code.
Thanks
| :rocket: This description was created by Ellipsis for commit 74e9b5b1f0f410ce0f78d6c1effcc6e9a5ba8cbd |
|---|
Summary:
Integrates LanceDB as a new vector database provider, enhancing the system's capabilities with metadata filtering and addressing bulk insert issues.
Key points:
- Added
LanceDBintegration for vector storage and retrieval. - Modified configuration and core files to support
LanceDB. - Addressed bulk insert issues and added metadata filtering.
- Tested integration with example client and configurations.
Generated with :heart: by ellipsis.dev
@raghavdixit99 is attempting to deploy a commit to the Sciphi-Team Team on Vercel.
A member of the Team first needs to authorize it.
hi @emrgnt-cmplxty , could you have a look at the dev PR, will raise an official one once I get some clarity on the above as well as the metadata schema.
@emrgnt-cmplxty hey can we get some action on this one?