Jan Philipp Harries
Jan Philipp Harries
**EDIT: My original tool definition doesn't work anymore as of 0.0.162, code updated** > Also, same question like @blazickjp is there a way to add chat memory to this ?...
> @jpdus Thank you! I'm still very confused on the design aspect of this; why an agent would be needed for something that feels very much like a chain. Well...
Well, but I stand somehow corrected, in the meantime there is already another chain that probably does what you want without invoking an Agent (even if I'd probably prefer that...
Sure, you can customize the QA prompt the same way as written above an the "condense_prompt" for the "question generator" (see example in the documentation) can be changed as well...
@drahoslavzan @hetthummar @olaf-hoops you are right, the code doesn't work anymore in the current version (0.0.162 as of now). There were multiple breaking changes and i can't exactly figure out...
> Hey @jpdus , not to derail the conversation here (which is great). But is there a way to use Structured Response Parser with `load_qa_with_sources_chain`? There surely is some way...
@tomatefarcie123 Yes, I tried my example above in async and it works. Please note that async execution and streaming are two related but different things and that not all chains...
@federicotorrielli had the same problem (but didn't need to initalize both models in parallel but sequentially). For me a workaround is: ```python from vllm import LLM, SamplingParams import gc import...
I have the same problem, amplified by a SystemMessage in German. The model often forgets the correct tokens for Tool usage or the final answer, resulting in the parsing error....
I have the same issue (which is not really critical but annoying) - any possibility to fix it in the underlying code (e.g. with some comment in the `client` line)?