Basic example broken with chat-type OpenAI models
The bug OpenAI chat models not compatible with the basic first example.
To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
import guidance
# set the default language model used to execute guidance programs
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")
# define a guidance program that adapts a proverb
program = guidance("""Tweak this proverb to apply to model instructions instead.
{{proverb}}
- {{book}} {{chapter}}:{{verse}}
UPDATED
Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
- GPT {{gen 'chapter'}}:{{gen 'verse'}}""")
# execute the program on a specific proverb
executed_program = program(
proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
book="Proverbs",
chapter=11,
verse=14
)
Output:
Tweak this proverb to apply to model instructions instead.
Where there is no guidance, a people falls,
but in an abundance of counselors there is safety.
- Proverbs 11:14
UPDATED
Where there is no guidance
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 94, in run
await self.visit(self.parse_tree)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 218, in visit
visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 218, in <listcomp>
visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 292, in visit
command_output = await command_function(*positional_args, **named_args)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/library/_gen.py", line 137, in gen
gen_obj = await parser.llm_session(
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 520, in __call__
out = self.llm.caller(**call_args)
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 307, in _library_call
kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 21, in prompt_to_messages
assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
Error in program: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[6], line 17
7 program = guidance("""Tweak this proverb to apply to model instructions instead.
8
9 {{proverb}}
(...)
13 Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
14 - GPT {{gen 'chapter'}}:{{gen 'verse'}}""")
16 # execute the program on a specific proverb
---> 17 executed_program = program(
18 proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
19 book="Proverbs",
20 chapter=11,
21 verse=14
22 )
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program.py:233, in Program.__call__(self, **kwargs)
231 return self._stream_run(loop, new_program)
232 else:
--> 233 loop.run_until_complete(new_program.execute())
235 return new_program
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/nest_asyncio.py:90, in _patch_loop.<locals>.run_until_complete(self, future)
87 if not f.done():
88 raise RuntimeError(
89 'Event loop stopped before Future completed.')
---> 90 return f.result()
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/asyncio/futures.py:201, in Future.result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/asyncio/tasks.py:256, in Task.__step(***failed resolving arguments***)
252 try:
253 if exc is None:
254 # We use the `send` method directly, because coroutines
255 # don't have `__iter__` and `__next__` methods.
--> 256 result = coro.send(None)
257 else:
258 result = coro.throw(exc)
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program.py:384, in Program.execute(self)
382 else:
383 with self.llm.session(asynchronous=True) as llm_session:
--> 384 await self._executor.run(llm_session)
385 self._text = self._executor.prefix
387 # delete the executor and so mark the program as not executing
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:98, in ProgramExecutor.run(self, llm_session)
96 print(traceback.format_exc())
97 print("Error in program: ", e)
---> 98 raise e
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:94, in ProgramExecutor.run(self, llm_session)
88 self.llm_session = llm_session
89 try:
90 # first parse all the whitespace control
91 # self.whitespace_control_visit(self.parse_tree)
92
93 # now execute the program
---> 94 await self.visit(self.parse_tree)
95 except Exception as e:
96 print(traceback.format_exc())
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
427 else:
428 inner_prev_node = prev_node
--> 429 visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
430 # visited_children = [self.visit(child) for child in node.children]
432 if len(visited_children) == 1:
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
427 else:
428 inner_prev_node = prev_node
--> 429 visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
430 # visited_children = [self.visit(child) for child in node.children]
432 if len(visited_children) == 1:
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:218, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
216 # visit our children
217 self.block_content.append([])
--> 218 visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
219 self.block_content.pop()
220 out = "".join("" if c is None else str(c) for c in visited_children)
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:218, in <listcomp>(.0)
216 # visit our children
217 self.block_content.append([])
--> 218 visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
219 self.block_content.pop()
220 out = "".join("" if c is None else str(c) for c in visited_children)
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
427 else:
428 inner_prev_node = prev_node
--> 429 visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
430 # visited_children = [self.visit(child) for child in node.children]
432 if len(visited_children) == 1:
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:292, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
290 if inspect.iscoroutinefunction(command_function):
291 await asyncio.sleep(0) # give other coroutines a chance to run
--> 292 command_output = await command_function(*positional_args, **named_args)
293 else:
294 command_output = command_function(*positional_args, **named_args)
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/library/_gen.py:137, in gen(name, stop, stop_regex, save_stop_text, max_tokens, n, stream, temperature, top_p, logprobs, pattern, hidden, list_append, save_prompt, token_healing, _parser_context)
134 assert parser.llm_session is not None, "You must set an LLM for the program to use (use the `llm=` parameter) before you can use the `gen` command."
136 # call the LLM
--> 137 gen_obj = await parser.llm_session(
138 parser_prefix+prefix, stop=stop, stop_regex=stop_regex, max_tokens=max_tokens, n=n, pattern=pattern,
139 temperature=temperature, top_p=top_p, logprobs=logprobs, cache_seed=cache_seed, token_healing=token_healing,
140 echo=parser.program.logprobs is not None, stream=stream, caching=parser.program.caching
141 )
143 if n == 1:
144 generated_value = prefix
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:520, in OpenAISession.__call__(self, prompt, stop, stop_regex, temperature, n, max_tokens, logprobs, top_p, echo, logit_bias, token_healing, pattern, stream, cache_seed, caching)
518 if logit_bias is not None:
519 call_args["logit_bias"] = {str(k): v for k,v in logit_bias.items()} # convert keys to strings since that's the open ai api's format
--> 520 out = self.llm.caller(**call_args)
522 except openai.error.RateLimitError:
523 await asyncio.sleep(3)
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:307, in OpenAI._library_call(self, **kwargs)
304 assert openai.api_key is not None, "You must provide an OpenAI API key to use the OpenAI LLM. Either pass it in the constructor, set the OPENAI_API_KEY environment variable, or create the file ~/.openai_api_key with your key in it."
306 if self.chat_mode:
--> 307 kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
308 del kwargs['prompt']
309 del kwargs['echo']
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:21, in prompt_to_messages(prompt)
18 def prompt_to_messages(prompt):
19 messages = []
---> 21 assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
23 pattern = r'<\|im_start\|>(\w+)(.*?)(?=<\|im_end\|>|$)'
24 matches = re.findall(pattern, prompt, re.DOTALL)
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
System info (please complete the following information):
- OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OSX
- Guidance Version (
guidance.__version__): '0.0.56'
That is true, but a chat prompt would also fail on standard completion models, so either way the first example has to break on something :) ...if you think we can improve the error message I am all ears!
I see your point. However, there is a strong incentive to use OpenAI's chat model gpt-3.5-turbo instead of the text-davinci-003 due to the 10x cost reduction in the chat variant. So, I was thinking, since behind the scenes, the chat models just construct a single prompt from the system and user prompts, then perhaps, the framework could also make some attempt to allow this flexibility in that if the user provides a single prompt, then perhaps it just goes into the user prompt with the system prompt either being empty, or being something generic. Conversely, if thee user and system prompts are provided separately and the model only accepts a single prompt, then perhaps the framework will add them with some appropriate section headings (ideally one that we know to be common to the variants of the provider, but does not have to be).
In this way, we can continue to benefit form Guidance at a cheaper cost. Does the motivation and what I am proposing make sense?
Generally, it seems like missed opportunity to limit the chat functionality so much. I also noticed that the gen inside assistant is completely neutered (e.g. does not even support pattern). I am not sure of the implementation details, but I dont think there is a good fundamental reason why that should be the case?
I was thinking, if there is a good way to override the gen, then I can easily modify it to do what I need (i.e. make the chat model behave as if it was a regular completion model)
I am not sure I entirely follow but the issue here is a limitation of the OpenAI API, we can't give partial completions yet to the assistant role. This means all generations have to be done as the only subtag inside an assistant role. You can use the above style of prompt with a chat model, but you do have to put in chat tags and respect that limitation :) like this:
import guidance
# set the default language model used to execute guidance programs
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")
# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}
{{#user}}Tweak this proverb to apply to model instructions instead.
{{proverb}}
- {{book}} {{chapter}}:{{verse}}
UPDATED
Where there is no guidance{{/user}}
{{#assistant}}{{gen 'rewrite' stop="\\n-"}}{{/assistant}}""")
# execute the program on a specific proverb
executed_program = program(
proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
book="Proverbs",
chapter=11,
verse=14
)
Thank you very much for your patience and the example.
Could you possibly explain how I can adapt the json example to the chat model? Here is my attempt:
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]
# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}
{{#user}}
Please generate character profile for an RPG game in JSON format.
{{/user}}
{{#assistant}}
```json
{
"id": "{{id}}",
"description": "{{description}}",
"name": "{{gen 'name'}}",
"age": {{gen 'age' pattern='[0-9]+' stop=','}},
"armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
"weapon": "{{select 'weapon' options=valid_weapons}}",
"class": "{{gen 'class'}}",
"mantra": "{{gen 'mantra' temperature=0.7}}",
"strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
"items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]
}```"
{{/assistant}}""")
# execute the program on a specific proverb
executed_program = program(
id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
description="A quick and nimble fighter.",
)
leads to
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
I was hoping there is a way to tell the framework "here is a chat model but just use it as if it was not", meaning, under the hood, every time there is a gen, it will get restructured to get generation from the assistant role (or whatever the model interface requires)... In absence of that, what is the proper way to make the above example work?
To be honest, I think the best approach is what I mentioned before, to write a wrapper to treat the chat model interface as if it is a simple completion interface.
I also am confused by this. The docs are a bit lacking in terms of explaining the error. I'm not sure what partial completion vs other means exactly! How would I use a chat model with this to get structured completions while also taking into account message history? I can always handle the message history manually.
@sam-cohan @slundberg
Yeah, I had a look. There are two potential ways this could be done:
Approach 1.) Tweaking the OpenAI class, so you can use gpt3.5-turbo as a drop in replacement for davinci.
https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py
(Details below)
-
The main change would be to add a chat_mode called "completion_with_chat".
-
When this is set, and the user attempts to use a
chat-modelfor simpler completion. Then on the line with the openai call, you instead convert thepromptargument, to a single element array formessagescoming in from a user, to simulate conversation history. -
You would also drop the logprobs argument which the chat api doesnt accept.
-
It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.
2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.
If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.
Thanks for your input @aabdullah-getguru . I was suggesting 1) but yeah I agree it would be useful for the project maintainers to give thumbs up on whether this is ok with them. If not, would love to know what workarounds they suggest...
is there an update on this?
I may be far off the mark, but it seems one benefit of guidance is the re-use of keys/values in cache. I don't see how this is possible if using an api (like openai or anthropic). I would have thought this is only possible if running one's own llm... is that correct?
So, I was confused to see gpt-4 and davinci in some of the examples...
If this is correct, perhaps adding a clarifying sentence to the start of the ReadMe would be of benefit.
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
This means that you can have only one gen statement in the assistant role. You cannot have a second gen or any other statements. Based on my tests:
{{#assistant}}must be immediately followed by{{gen ... }}. No spaces, new lines or any characters in between.{{gen ... }}must be followed by{{/assistant}}but you can have any plain text, spaces, new lines between them. However, you cannot have any library function like anothergen, aselect,if, etc.
I think the assertion also means that you cannot have any gen statement in other roles. But I'm not sure of this.
So the chat completions are limited to simple generations in the assistant role. You cannot do structured generations like the JSON @sam-cohan tried.
This is a HUGE bummer because I was able to successfully do structured JSON generations with the chat model using LongChain + Pydantic. So it should be possible and I was looking forward to replicating it in Guidance.
Why do we have this assertion? assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat...
@sam-cohan @slundberg
Yeah, I had a look. There are two potential ways this could be done:
Approach 1.) Tweaking the OpenAI class, so you can use
gpt3.5-turboas a drop in replacement for davinci. https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py(Details below)
- The main change would be to add a chat_mode called "completion_with_chat".
- When this is set, and the user attempts to use a
chat-modelfor simpler completion. Then on the line with the openai call, you instead convert thepromptargument, to a single element array formessagescoming in from a user, to simulate conversation history.- You would also drop the logprobs argument which the chat api doesnt accept.
- It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.
2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.
If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.
Could you elaborate more on this, please? I have only access to gpt3.5-turbo and need it to work on the RPG example. AFAIK in chat mode, only one gen statement is allowed inside the assistant role, such that one would can only generate a value (gen statement) per key. In order to fill a whole JSON I would need to add the key to the JSON and place the gen statement where its value should be generated, then after getting that result, update the json and append the next key and gen statement, repeat until done?
any support or updates for the newest guidance version?
any updates on this one please ?