NeMo-Guardrails RunnableRails performance weirdness

res =guardrails.invoke({"input":"How do I cook meat"}) 0.5s

I'm defining a chain, not using it ! the llm is local, while the llm in the yml file is openAI

chain = print_func|(guardrails |llm)| print_func|extract_output res =guardrails.invoke({"input":"How do I cook meat"}) 35s and the answer is incorrect

chain.invoke(...) 35s same answer

Restart:

res =guardrails.invoke({"input":"How do I cook meat"}) 0.5s

chain = print_func | llm | print_func | extract_output chain2 = guardrails | llm

res =guardrails.invoke({"input":"How do I cook meat"}) 35s incorrect result

Note: it looks like as soon as I attach the runnable to a chain, it triggers the local llm even if not using it

May 10 '24 15:05 pechaut78

nota:

guardrails = RunnableRails(config=config, passthrough=False)

May 10 '24 15:05 pechaut78

Nota:

If i do: chain = print_func|llm| print_func|extract_output chain2 = guardrails |chain

no slowdown ..

chain = print_func|llm| print_func|extract_output chain2 = guardrails |llm

slowdown..

May 10 '24 15:05 pechaut78

Hi @pechaut78! You are correct that the behavior is weird. This is because the "|" operator actually mutates the RunnableRails instance (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/integrations/langchain/runnable_rails.py#L86). This is a bug, it should create a new instance. We'll fix this for the next release. Thanks for reporting!

May 10 '24 16:05 drazvan