langchainrb Vectorsearch::Pinecone - `#ask` tokens from context exceed max tokens allowed

Context

When asking a question, depending on how have been stored the data, it's possible to have a context exceeding the max tokens of the LLM. https://github.com/andreibondarev/langchainrb/blob/ccd0fd53a9737fb61c82058e86da1c9b855ccd7f/lib/langchain/vectorsearch/pinecone.rb#L113-L123

Suggestion

In this #ask method, we could:

Add the possibility to chose the quantity (k) of context we want to get when querying the DB
Add a custom logic once we have the context elements and buidling the context to avoid having a certain quantity of tokens

WDYT?

Jun 06 '23 12:06 mael-ha

I noticed this too with openai and proposed something here: https://github.com/andreibondarev/langchainrb/issues/123 to to automatically set the quantity. So ideally either let it get set automatically, or choose it?

Jun 06 '23 13:06 mattlindsey

@mael-ha

You'd like to introduce a new k: param in the ask() method to pass it down to the similarity_search(), right? It makes a lot of sense! I think we should do it for other Vectorsearch classes as well.
We've got TokenLengthValidator but it currently only works for OpenAI. It's called when the OpenAI#chat() method is invoked: https://github.com/andreibondarev/langchainrb/blob/main/lib/langchain/llm/openai.rb#L105-L107. Did you not see this error raised?

Jun 06 '23 13:06 andreibondarev

I think the future of vector storage usage is within the context of the Assistant, not via the .ask() method.

Oct 21 '24 14:10 andreibondarev