DocsGPT icon indicating copy to clipboard operation
DocsGPT copied to clipboard

Vectorstore/lancedb

Open akashAD98 opened this issue 11 months ago • 14 comments

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) added lancedb as vector store

  • Why was this change needed? (You can also link to an open issue here) Lancedb is a serverless vector database for AI applications. Easily add long-term memory to your LLM apps

akashAD98 avatar Mar 24 '24 09:03 akashAD98

@akashAD98 is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Mar 24 '24 09:03 vercel[bot]

Codecov Report

Attention: Patch coverage is 52.63158% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 20.19%. Comparing base (3c49206) to head (8c4f96d). Report is 30 commits behind head on main.

:exclamation: Current head 8c4f96d differs from pull request most recent head 187d7be. Consider uploading reports for the commit 187d7be to get more accurate results

Files Patch % Lines
application/vectorstore/lancedb.py 50.00% 9 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #889      +/-   ##
==========================================
+ Coverage   20.00%   20.19%   +0.18%     
==========================================
  Files          72       73       +1     
  Lines        3264     3283      +19     
==========================================
+ Hits          653      663      +10     
- Misses       2611     2620       +9     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 24 '24 13:03 codecov[bot]

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 27, 2024 7:12pm

vercel[bot] avatar Mar 27 '24 19:03 vercel[bot]

any chance to merge it @ajaythapliyal @dartpain

akashAD98 avatar Mar 28 '24 14:03 akashAD98

Thanks @akashAD98 for your contribution and the effort you've put into this PR. I've taken a look at the code, and I have a few points I think we should address before merging:

  • Error Handling for docs_init: It's crucial to handle cases where docs_init is not provided. Without proper initialisation, the object won't be instantiated correctly, leading to potential issues with other methods dependent on it.
  • Missing Configuration Variables from Settings: It seems like there's a gap in providing necessary configuration variables from settings. This could limit the flexibility and usability of the module.
  • Implementation of delete_index method.


Overall, addressing these points will enhance the robustness and usability of the module. Looking forward to your updates!

siiddhantt avatar Apr 02 '24 12:04 siiddhantt

Thanks for reply & review @siiddhantt 1. done 2. I'm already importing settings. & there is no ant other configuration needed 3. lancdb doesn't have support for delete methods as of now. when I check their langchain integration

akashAD98 avatar Apr 04 '24 16:04 akashAD98

@siiddhantt @dartpain

akashAD98 avatar Apr 16 '24 15:04 akashAD98

My concern is how can we pass URI for example to make it work.

 uri = "data/sample-lancedb"
db = lancedb.connect(uri)

Check out the quickstart https://lancedb.github.io/lancedb/basic/#installation

dartpain avatar Apr 16 '24 16:04 dartpain

ALSO we can pass (uri="/tmp/lancedb")

from langchain_community.vectorstores import LanceDB
from application.vectorstore.base import BaseVectorStore
from application.core.settings import settings

class LancedbStore(BaseVectorStore):
    def __init__(self, uri, embeddings_key):
        super().__init__()
        self.uri = uri
        self.embeddings_key = embeddings_key
        self.docsearch = None  
        
        # Initialize the embeddings using the provided key
        embeddings = self._get_embeddings(settings.EMBEDDINGS_NAME, self.embeddings_key)
        
        # Initialize LanceDB with the appropriate URI and embeddings
        self.docsearch = LanceDB(
            uri=self.uri,
            embedding=embeddings,
            api_key=settings.LANCE_API_KEY,  # Assuming API Key is managed in settings
            region=settings.LANCE_REGION    # Assuming Region is managed in settings
        )
    
    def search(self, query, k=5, **kwargs):
        # Perform a similarity search using LanceDB
        if self.docsearch:
            return self.docsearch.similarity_search(query=query, k=k, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def add_texts(self, texts, metadatas=None, ids=None, **kwargs):
        # Add texts to the LanceDB instance
        if self.docsearch:
            return self.docsearch.add_texts(texts, metadatas=metadatas, ids=ids, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def delete(self, ids=None, delete_all=False, filter=None, drop_columns=None, name=None, **kwargs):
        # Delete documents from the LanceDB instance
        if self.docsearch:
            self.docsearch.delete(ids=ids, delete_all=delete_all, filter=filter, drop_columns=drop_columns, name=name, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def save_local(self, *args, **kwargs):
        # Currently, it's just a placeholder as LanceDB operations are handled internally
        pass


can be done like this ? havnt tested but just approch @dartpain

akashAD98 avatar May 05 '24 12:05 akashAD98

Yeah, I think this will solve most of the issues, sorry for a delay.

dartpain avatar Jun 25 '24 12:06 dartpain

@dartpain can you please add lancedb as vectordb ? that would be very helpful as im looking for serverless vectordb so

akashAD98 avatar Jun 25 '24 14:06 akashAD98

Yeah, Ill add it once I have a bit more capacity or someone from our team, I like the DB

dartpain avatar Jun 26 '24 11:06 dartpain

thanks looking forward for this

akashAD98 avatar Jun 26 '24 13:06 akashAD98

@dartpain any update on this ?

akashAD98 avatar Jul 17 '24 14:07 akashAD98