langchain4j-examples icon indicating copy to clipboard operation
langchain4j-examples copied to clipboard

If I already have my embeddings data in my DB previously inserted. Is ok to use InMemoryEmbeddingStore in the agent logic?

Open caosDvlp opened this issue 1 year ago • 0 comments

Hi!

I already have a service that is saving my data in embeddings in pgvector compatibel PostgreSQL DB.

Now I built the logic for the agent and is working super good, but I saw that even if I already have the data in the DB stored previously is necessary to store it temporarily in a InMemoryEmbeddingStore to be able to use the ContentRetriever and then call the AIService. Right now Im getting the embeddings from my DB and then I need yo do the next logic to call the AIService.

` List<Embedding> embeddings = bedrockTitanEmbeddingModel.embedAll(textSegments).content(); EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); embeddingStore.addAll(embeddings, textSegments);

    ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(bedrockTitanEmbeddingModel)
            .maxResults(15) // on each interaction we will retrieve the 5 most relevant segments
            .minScore(0.2) // we want to retrieve segments very similar to the user query
            .build();

    // Optionally, we can use a chat memory, enabling back-and-forth conversation with the LLM
    // and allowing it to remember previous interactions.
    // Currently, LangChain4j offers two chat memory implementations:
    // MessageWindowChatMemory and TokenWindowChatMemory.
    ChatMemory chatMemory = MessageWindowChatMemory.withMaxMessages(10);

    return AiServices.builder(ErekyAiAgent.class)
            .chatMemory(chatMemory)
            .chatLanguageModel(bedrockAnthropicChatModel)
            .contentRetriever(contentRetriever)
            .build();`

There is a way to avoid using this intermediate storing since I already have my data previously saved in the DB and use that data directly to call the AI? Or is always mandatory to use this intermediate memory storage?

I checked your PGvector implementation but is creating a table and storing the data there every time a user does a question to the AI. I don't want this since I already have the data saved previously from another service and this is the reason why Im using the InMemoryEmbeddingStore. Is this the right approach?

Thank you guys! I find your project super interesting and that is the reason I'm asking this question and also other ones in another issues. ;)

caosDvlp avatar Feb 07 '24 11:02 caosDvlp