DocsGPT
DocsGPT copied to clipboard
Vectorstore/lancedb
-
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) added lancedb as vector store
-
Why was this change needed? (You can also link to an open issue here) Lancedb is a serverless vector database for AI applications. Easily add long-term memory to your LLM apps
@akashAD98 is attempting to deploy a commit to the Arc53 Team on Vercel.
A member of the Team first needs to authorize it.
Codecov Report
Attention: Patch coverage is 52.63158%
with 9 lines
in your changes are missing coverage. Please review.
Project coverage is 20.19%. Comparing base (
3c49206
) to head (8c4f96d
). Report is 30 commits behind head on main.
:exclamation: Current head 8c4f96d differs from pull request most recent head 187d7be. Consider uploading reports for the commit 187d7be to get more accurate results
Files | Patch % | Lines |
---|---|---|
application/vectorstore/lancedb.py | 50.00% | 9 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #889 +/- ##
==========================================
+ Coverage 20.00% 20.19% +0.18%
==========================================
Files 72 73 +1
Lines 3264 3283 +19
==========================================
+ Hits 653 663 +10
- Misses 2611 2620 +9
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
docs-gpt | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 27, 2024 7:12pm |
any chance to merge it @ajaythapliyal @dartpain
Thanks @akashAD98 for your contribution and the effort you've put into this PR. I've taken a look at the code, and I have a few points I think we should address before merging:
- Error Handling for
docs_init
: It's crucial to handle cases wheredocs_init
is not provided. Without proper initialisation, the object won't be instantiated correctly, leading to potential issues with other methods dependent on it. - Missing Configuration Variables from Settings: It seems like there's a gap in providing necessary configuration variables from settings. This could limit the flexibility and usability of the module.
- Implementation of
delete_index
method.
Overall, addressing these points will enhance the robustness and usability of the module. Looking forward to your updates!
Thanks for reply & review @siiddhantt 1. done 2. I'm already importing settings. & there is no ant other configuration needed 3. lancdb doesn't have support for delete methods as of now. when I check their langchain integration
@siiddhantt @dartpain
My concern is how can we pass URI for example to make it work.
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
Check out the quickstart https://lancedb.github.io/lancedb/basic/#installation
ALSO we can pass (uri="/tmp/lancedb")
from langchain_community.vectorstores import LanceDB
from application.vectorstore.base import BaseVectorStore
from application.core.settings import settings
class LancedbStore(BaseVectorStore):
def __init__(self, uri, embeddings_key):
super().__init__()
self.uri = uri
self.embeddings_key = embeddings_key
self.docsearch = None
# Initialize the embeddings using the provided key
embeddings = self._get_embeddings(settings.EMBEDDINGS_NAME, self.embeddings_key)
# Initialize LanceDB with the appropriate URI and embeddings
self.docsearch = LanceDB(
uri=self.uri,
embedding=embeddings,
api_key=settings.LANCE_API_KEY, # Assuming API Key is managed in settings
region=settings.LANCE_REGION # Assuming Region is managed in settings
)
def search(self, query, k=5, **kwargs):
# Perform a similarity search using LanceDB
if self.docsearch:
return self.docsearch.similarity_search(query=query, k=k, **kwargs)
else:
raise ValueError("LanceDB instance is not initialized.")
def add_texts(self, texts, metadatas=None, ids=None, **kwargs):
# Add texts to the LanceDB instance
if self.docsearch:
return self.docsearch.add_texts(texts, metadatas=metadatas, ids=ids, **kwargs)
else:
raise ValueError("LanceDB instance is not initialized.")
def delete(self, ids=None, delete_all=False, filter=None, drop_columns=None, name=None, **kwargs):
# Delete documents from the LanceDB instance
if self.docsearch:
self.docsearch.delete(ids=ids, delete_all=delete_all, filter=filter, drop_columns=drop_columns, name=name, **kwargs)
else:
raise ValueError("LanceDB instance is not initialized.")
def save_local(self, *args, **kwargs):
# Currently, it's just a placeholder as LanceDB operations are handled internally
pass
can be done like this ? havnt tested but just approch @dartpain
Yeah, I think this will solve most of the issues, sorry for a delay.
@dartpain can you please add lancedb as vectordb ? that would be very helpful as im looking for serverless vectordb so
Yeah, Ill add it once I have a bit more capacity or someone from our team, I like the DB
thanks looking forward for this
@dartpain any update on this ?