NeMo-Guardrails
NeMo-Guardrails copied to clipboard
Issue with streaming with Ollama
https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/scripts/demo_streaming.py
I am trying to enable stream for my project using above. But somehow I'm not able to enable streaming. I use ollama model llama
history = [{"role": "user", "content": "Is your university located in Scotland?"}]
streaming_handler = StreamingHandler()
streaming_handler_var.set(streaming_handler)
async def process_tokens():
async for chunk in streaming_handler:
print(f"CHUNK: {chunk}")
# Or do something else with the token
asyncio.create_task(process_tokens())
result = await app.generate_async(
messages=history, streaming_handler=streaming_handler
)
print(result)
And my custom action is
async def chain(query):
o_llm = Ollama(model="llama2",base_url="https://my-driven-ollama.com/")
chain = RetrievalQA.from_chain_type(
llm=o_llm, chain_type="stuff", retriever=retriever,chain_type_kwargs={
"prompt": PROMPT,
},)
call_config = RunnableConfig(callbacks=[streaming_handler_var.get()])
response = await chain.ainvoke(query,config=call_config)
return response['result']
@drazvan Any idea on why this is happening or am I missing something?
Also thinking to get source documents in output as well. If I add return_source_documents=True I get error like expecting str dict found