dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Question: How to get the token count of the prompt for maximum utilization of the context window during prompt compression of RAG pipeline?

Open theta-lin opened this issue 7 months ago • 0 comments

When running a DSPy module with a given signature, I am interested in getting the token count of the "prompt template" that it currently passes to the LM, by which I meant the number of input tokens passed to the LM minus the token counts of the input fields. This would thus count the length of the signature description, field descriptions, and the few-shot examples.

I am interested in this as I am currently building a RAG pipeline that retrieves texts from a database to synthesize the final response. However, the total length of the texts retrieved from the database might exceed the context window size of the LM I am using. Thus, a iterative or recursive summarization process is need to compress the prompt before synthesizing the final response. While I acknowledge that you can simply summarize each chunk of text one-by-one to be extra cautious to not exceed the context window, I think this might not be the most effective way to do this.

I originally built the RAG pipeline entirely using LlamaIndex where the response would be generated by response synthesizers. Note that the compact mode of response synthesizers would try to pack as many tokens from the retrieved contexts into a single LM call as possible to reduce the number of calls. This is achived via PromptHelper that squeezes as many tokens into the fields of the prompt template as possible so that the length of the fields altogether does not excess context_window - prompt_template_length.

Now, as I am switching all the prompting to DSPy for more flexibility, I wonder what would be the best way for me to implement something alike PromptHelper? I also checked how the LlamaIndex integration for DSPy does this: https://github.com/stanfordnlp/dspy/blob/55510eec1b83fa77f368e191a363c150df8c5b02/dspy/predict/llamaindex.py#L22-L36 It appears that it converts the signature to a legacy format first? Therefore, would this be a good approach to this problem or are there better alternatives?

Related issues: #101: I understand I could reduce the number of bootstrapped demos to include, but I need to work with long contexts regardless. #381: This guy also wants to implement some kind of prompt compression, but no update so far.

theta-lin avatar Jul 04 '24 13:07 theta-lin