NeMo-Guardrails Issue with streaming with Ollama

Issue with streaming with Ollama

Open faisnain opened this issue 1 year ago • 1 comments

https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/scripts/demo_streaming.py

I am trying to enable stream for my project using above. But somehow I'm not able to enable streaming. I use ollama model llama

history = [{"role": "user", "content": "Is your university located in Scotland?"}]
  
streaming_handler = StreamingHandler()
streaming_handler_var.set(streaming_handler)
async def process_tokens():
    async for chunk in streaming_handler:
        print(f"CHUNK: {chunk}")
        # Or do something else with the token
asyncio.create_task(process_tokens())
result = await app.generate_async(
    messages=history, streaming_handler=streaming_handler
)
print(result)

And my custom action is

async def chain(query):
    o_llm = Ollama(model="llama2",base_url="https://my-driven-ollama.com/")
    chain = RetrievalQA.from_chain_type(
            llm=o_llm, chain_type="stuff", retriever=retriever,chain_type_kwargs={
            "prompt": PROMPT,
        },)
    call_config = RunnableConfig(callbacks=[streaming_handler_var.get()])
    response = await chain.ainvoke(query,config=call_config)
    return response['result']

@drazvan Any idea on why this is happening or am I missing something?

Also thinking to get source documents in output as well. If I add return_source_documents=True I get error like expecting str dict found

Jun 25 '24 20:06 faisnain

NeMo-Guardrails NeMo-Guardrails copied to clipboard

Issue with streaming with Ollama

NeMo-Guardrails
NeMo-Guardrails copied to clipboard