dspy
dspy copied to clipboard
Tips for long-form generation (getting around max_tokens error)
Hi, I'm using DSPy for structured information from chunks of large documents. Often, I run into a max_tokens error while using gemini-1.5-pro which has a max_output_token limit of 8,192 tokens. Is there any guidance on using DSPy for multiple LLM calls to get full extraction from the chunk and avoid the max_tokens error?
My initial idea for a fix is Step 1 - Full chunk input -> First LLM call -> generate output till position x1 of chunk -> First LLM output Step 2 - Full chunk input + First LLM output -> generate output after position x1, but before position x2 -> Second LLM output Step 3 - Full chunk input + Second LLM output -> generate output after position x2 but before position x3 -> Third LLM output . . . Step N - Full chunk input + N- 1 LLM output -> generate output after position N -1 but before position N (end of chunk) -> N LLM output
Any guidance on a better way? Perhaps, running all N partitions of the chunk parallely and then de-duplicating and combining the output as a preprocessing step would work better?
Thanks