haystack
haystack copied to clipboard
feat: Tokenizer Aware Prompt Builder
Is your feature request related to a problem? Please describe. For RAG QA often we want to fully utilize the context window of the model by inserting as many retrieved documents as possible. However, it is not easily possible for a user to know ahead of time how many documents they can pass to the LLM without overflowing the context window. Currently this can only be accomplished with trial and error and often times choosing a "correct" top_k is not possible because the documents in a database can vary greatly in length so some queries might cause an overflow and others not depending on the retrieved documents.
Describe the solution you'd like Therefore, we would like to create a type of Prompt Builder that can truncate some of the inserted variables into the prompt (e.g. truncate the documents but none of the instructions). This would basically amount to calculating a dynamic top_k based on the token count of the retrieved documents. To be able to perform this truncation this Prompt Builder would need to be tokenizer aware.
This would allow users to set a relatively large top_k with the confidence that the most irrelevant documents get removed if they happen to cause the context window to be exceeded. This would provide a more consistent search experience to users since we would no longer run the risk of removing instructions that often come after the inserted documents in the prompt.
Can we reformulate the issue to something like:
"Provide tokenization options to limit document or text length in pipelines"
I could see multiple places where this applies and multiple strategies too.
For documents in a prompt we could:
- start truncating from the end
- truncate every document a little so that all of them fit
~But it's not only prompts, the same would apply to a ranker too.~
But it's not only prompts, the same would apply to a ranker too.
Just out of curiosity, what scenario do you have in mind where this would be relevant to have only in the Ranker?
I was actually too fast with that :D
Ranker only gets one document at a time.
I'm going to give this a shot. I'll draft something under PromptTokenAwareBuilder
, and report back for feedback.
Wouldn't it be nicer just to add a new component that crops context to the amount of tokens needed depending on how you want to do it, defeating docs or just a piece. This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.
Yes, came around on that too.
My thoughts were something like:
DocumentsTokenTruncater
(probably not a great name) that accepts a list of documents and can truncate them according to different strategies (e.g. truncate left, right, or each).
A TextTokenTruncater
could be added for other use cases.
The only problem with that approach would be document metadata that you want to use in the prompt.
Generally, I feel like this is less important now since the context length of most models has increased so much.
This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.
One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.
Generally, I feel like this is less important now since the context length of most models has increased so much.
And yeah I agree with this, this has become less urgent now since context lengths are so large nowadays.
One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.
People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.
One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.
People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.
We could just add a count tokens method to the prompt template that accepts a tokenizer and returns the number of tokens of the prompt after removing all the jinja stuff.