[BUG] colang content causing streaming issues
Hello I have this sample code below that I am trying to run.
its based off of streaming docs.
though I'm hitting issues with streaming the output of the LLMRail when using colang content. Do I have something setup incorrectly for my colang _content?
pip install --upgrade --force-reinstall nemoguardrails openai==0.27.8
import asyncio
from langchain_community.chat_models import ChatOpenAI
from nemoguardrails import LLMRails, RailsConfig
from nemoguardrails.streaming import StreamingHandler
import logging
logging.basicConfig(level=logging.INFO)
openai_api_key = 'sk-...' # YOUR_API_TOKEN
llm = ChatOpenAI(model_name='gpt-3.5-turbo-16k',
openai_api_key=openai_api_key,
temperature=0,
streaming=True)
YAML_CONFIG = """
models:
- type: main
engine: openai
model: gpt-3.5-turbo-16k
streaming: True
"""
colang_content = """
define user ask weather
"how is the weather today?"
"should I wear a coat?"
define bot answer weather service down
"Unfortunately, I'm unable to access my weather service API at the moment. This means I can't provide you with the real-time weather information such as weather conditions or forecasts for your area"
define flow weather
user ask weather
bot answer weather service down
define flow
user ...
bot greeting
"""
async def demo_1():
"""Demo using the streaming of response chunks directly."""
config = RailsConfig.from_content(yaml_content=YAML_CONFIG, colang_content=colang_content)
#config = RailsConfig.from_content(yaml_content=YAML_CONFIG) #allows streaming but no colang
app = LLMRails(config, llm=llm, verbose=False)
history = [{"role": "user", "content": "tell me a story about Unicorns?"}]
async for chunk in app.stream_async(messages=history):
print(chunk, end="")
asyncio.run(demo_1())
I get entire result at end but not print every chunk that gets processed. Is that expect output?
could it be I'm using a different version of Colang language that LLMRails is expecting? Thank you for your time!
seems to be an issue with
define user ask weather
"how is the weather today?"
"should I wear a coat?"
when that chunk is removed from colang_content streaming works. Wonder why though..?
Though it would remove the ability to find semantically similar user questions in regards to current weather, if proceeding forward with removal of chunk
@drazvan wanted to bump this, see if there's anything I can do to circumvent this unexpected result on the latest version of guardrails.
Thank you!
Hi @shiv248!
The issue is due to the LLM not following the exact format when generating the response. Instead of generating "<message>", it is missing the two leading spaces. If you want to test with a clone of the nemoguardrails repo, you can replace line 876 here:
https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/actions/llm/generation.py#L876
with
streaming_handler.set_pattern(prefix='"', suffix='"')
(removed the two leading spaces for the prefix).
We'll try to figure out a solution for this. And hopefully it will make it to 0.10.0.
@drazvan wow that's an interesting find. What if I set it to the other case in that if else statement, by doing output_parser: verbose_v1 in my config could that potentially fix it too?
since that would make it
streaming_handler.set_pattern(prefix='Bot message: "', suffix='"') which I can still work around post generation.
Yes, that is potentially a good workaround. If you override the prompt for gpt-3.5-turbo-16k similar to https://github.com/NVIDIA/NeMo-Guardrails/blob/main/nemoguardrails/llm/prompts/dolly.yml, then it could work.