haystack feat: Tokenizer Aware Prompt Builder

Is your feature request related to a problem? Please describe. For RAG QA often we want to fully utilize the context window of the model by inserting as many retrieved documents as possible. However, it is not easily possible for a user to know ahead of time how many documents they can pass to the LLM without overflowing the context window. Currently this can only be accomplished with trial and error and often times choosing a "correct" top_k is not possible because the documents in a database can vary greatly in length so some queries might cause an overflow and others not depending on the retrieved documents.

Describe the solution you'd like Therefore, we would like to create a type of Prompt Builder that can truncate some of the inserted variables into the prompt (e.g. truncate the documents but none of the instructions). This would basically amount to calculating a dynamic top_k based on the token count of the retrieved documents. To be able to perform this truncation this Prompt Builder would need to be tokenizer aware.

This would allow users to set a relatively large top_k with the confidence that the most irrelevant documents get removed if they happen to cause the context window to be exceeded. This would provide a more consistent search experience to users since we would no longer run the risk of removing instructions that often come after the inserted documents in the prompt.

Dec 19 '23 15:12 sjrl

Can we reformulate the issue to something like:

"Provide tokenization options to limit document or text length in pipelines"

I could see multiple places where this applies and multiple strategies too.

For documents in a prompt we could:

start truncating from the end
truncate every document a little so that all of them fit

~But it's not only prompts, the same would apply to a ranker too.~

Dec 22 '23 08:12 mathislucka

But it's not only prompts, the same would apply to a ranker too.

Just out of curiosity, what scenario do you have in mind where this would be relevant to have only in the Ranker?

Dec 22 '23 08:12 sjrl

I was actually too fast with that :D

Ranker only gets one document at a time.

Dec 22 '23 09:12 mathislucka

I'm going to give this a shot. I'll draft something under PromptTokenAwareBuilder, and report back for feedback.

May 02 '24 22:05 medsriha

Wouldn't it be nicer just to add a new component that crops context to the amount of tokens needed depending on how you want to do it, defeating docs or just a piece. This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

Jun 06 '24 19:06 CarlosFerLo

Yes, came around on that too.

My thoughts were something like:

DocumentsTokenTruncater (probably not a great name) that accepts a list of documents and can truncate them according to different strategies (e.g. truncate left, right, or each).

A TextTokenTruncater could be added for other use cases.

The only problem with that approach would be document metadata that you want to use in the prompt.

Generally, I feel like this is less important now since the context length of most models has increased so much.

Jun 07 '24 07:06 mathislucka

This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

Generally, I feel like this is less important now since the context length of most models has increased so much.

And yeah I agree with this, this has become less urgent now since context lengths are so large nowadays.

Jun 07 '24 07:06 sjrl

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

Jun 07 '24 11:06 mathislucka

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

We could just add a count tokens method to the prompt template that accepts a tokenizer and returns the number of tokens of the prompt after removing all the jinja stuff.

Jun 07 '24 11:06 CarlosFerLo