langchain
langchain copied to clipboard
FewShotPromptTemplate bug on examples with JSON strings
System Info
Consider the following python code snippet below.
On the example, the code will provide an error. This is due to the fact that it is not correctly parsing the JSON strings. For some reason, the template will try to look for the variable "Person", when that is just a JSON output form.
This means that the classes are confusing what are variables of the template and what are just text within a JSON string on what we are feeding as an example.
What is curious is that this does not happen for the PromptTemplate class. So, it must be about the iteration between examples.
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [X] Prompts / Prompt Templates / Prompt Selectors
- [X] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
python'''
from langchain import PromptTemplate from langchain import FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector import json
create our examples
examples = [ { "query": "my text number 1", "answer": """{"person": "Jack", "Location": "France"}""" }, { "query": "my text number 2", "answer": """{"person": "James", "Location": "Portugal"}""" } ]
create a example template
example_template = """ User: {query} AI: {answer} """
create a prompt example from above template
example_prompt = PromptTemplate( input_variables=["query", "answer"], template=example_template )
now break our previous prompt into a prefix and suffix
the prefix is our instructions
task_description = f""" Context for the task """
and the suffix our user input and output indicator
instruction = """ User: {query} AI: """
now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate( examples=examples, example_prompt=example_prompt, prefix=task_description, suffix=instruction, input_variables=["query"], example_separator="\n\n" )
print(few_shot_prompt_template.format(query="My awesome query"))
'''
Expected behavior
It would be expected to output a prompt with several examples in which the answer was a JSON string:
""" User: My text number 1 AI: {"person": "Jack", "Location": "France"}
User: My text number 2 AI: {"person": "James", "Location": "Portugal"}
User: My new input text AI: """
escaping the { with {{ should work. However thats a temporary fix..
I was just thinking about that and came back here to see if there was some feedback. It works!!! Thanks for the validation 👍
I tried something similar to this on my work case:
""" User: My text number 1 AI: {{"person": "Jack", "Location": "France"}} """
But yes, at this point is a workaround. I'm not sure if the package should have an "escaping function" (adding the extra {} ) or if there could be another way in the inner works.
Hello @JoaoSilva02 , I have same problem with JSON input for few shot template on Langchain.
Here is my code, let me know if you have found a workaround :
examples=[
{
"input":"ET Dec23 Synth 57.0 / 60.0",
"desired_output":{"test":[{"entity":"ET","strategy":"Synth","prices":[{"size":57.0,"quantity":60.0,"date":"Dec23"}]}]}
}
]
template="""
Human:{input}
Assistant:{desired_output}
"""
prompt = PromptTemplate(input_variables=["input", "desired_output"], template=template)
few_shot_prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=prompt,
input_variables=["input"],
suffix="Human: {input}\Assistant:"
)
print(few_shot_prompt.format(input="FTSE EFP 11.5/12.25"))
Error is the following : KeyError: "'test'"
escaping the { with {{ should work. However thats a temporary fix..
Is it really 'escaping'? I think langchain then uses the jinja2-templating, right?
However, that would be my solution: just use the jinja2 templating format instead of f-string style formatting. It can be configured in the constructor of the PromptTemplates.
Hello @JoaoSilva02 , I have same problem with JSON input for few shot template on Langchain.
Here is my code, let me know if you have found a workaround :
examples=[ { "input":"ET Dec23 Synth 57.0 / 60.0", "desired_output":{"test":[{"entity":"ET","strategy":"Synth","prices":[{"size":57.0,"quantity":60.0,"date":"Dec23"}]}]} } ] template=""" Human:{input} Assistant:{desired_output} """ prompt = PromptTemplate(input_variables=["input", "desired_output"], template=template) few_shot_prompt = FewShotPromptTemplate( examples=examples, example_prompt=prompt, input_variables=["input"], suffix="Human: {input}\Assistant:" ) print(few_shot_prompt.format(input="FTSE EFP 11.5/12.25"))
Error is the following : KeyError: "'test'"
Following Did you find a solution to solve this issue ? I have the same... I tried to put out the " " in my JSON, but the resultat is the same. At this point I don't know how to do. Here is my code :
Code :
`examples = [ { "question": "Create a Jira ticket to integrate my MySQL database to our current assets", "answer": """{{fields: {project: {key: "AJ"}, summary: "Create a Jira ticket to integrate my MySQL database into our current assets", issuetype: {name: "Story"}, priority: {name: "High"}, description: {type: "doc", version: 1, content: [{type}""" }]
example_template = """ User: {question}\nAI: {answer} """
example_prompt = PromptTemplate(input_variables=["question", "answer"], template=example_template)
task_description = f""" You're a AI Jira assitant specialized into creating tickets """
instruction = """ User: {question} AI: """
few_shot_prompt_template = FewShotPromptTemplate( examples=examples, example_prompt=example_prompt, prefix=task_description, suffix=instruction, input_variables=["question"], example_separator="\n\n" )
few_shot_prompt_template.format(question="Create a Jira ticket to develop a new recommendation system for movies") `
Error :
File ~/miniconda3/envs/torch/lib/python3.10/string.py:270, in Formatter.get_field(self, field_name, args, kwargs) 267 def get_field(self, field_name, args, kwargs): 268 first, rest = _string.formatter_field_name_split(field_name) --> 270 obj = self.get_value(first, args, kwargs) 272 # loop through the rest of the field_name, doing 273 # getattr or getitem as needed 274 for is_attr, i in rest:
File ~/miniconda3/envs/torch/lib/python3.10/string.py:227, in Formatter.get_value(self, key, args, kwargs) 225 return args[key] 226 else: --> 227 return kwargs[key]
KeyError: '"project"'
Problem solved : It is necessary to double all the brackets
Example :
formatted_string = '{{"fields": {{"project": {{"key": "AJ"}}, "summary": "Create a Jira ticket to integrate my MySQL database into our current assets", "issuetype": {{"name": "Story"}}, "priority": {{"name": "High"}}, "description": {{"type": "doc", "version": 1, "content": [{{"type": "paragraph", "content": [{{"type": "text", "text": "As a developer, I want to integrate my MySQL database with our current assets to improve data management."}}]}}, {{"type": "heading", "attrs": {{"level": 2}}, "content": [{{"type": "text", "text": "Acceptance Criteria:"}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- The MySQL database is successfully integrated with the application."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Data can be efficiently stored and retrieved from the integrated MySQL database."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- The integration process follows best practices and security standards."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- The integration is documented for future reference."}}]}}, {{"type": "heading", "attrs": {{"level": 2}}, "content": [{{"type": "text", "text": "Subtasks:"}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Analyze the structure of the MySQL database."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Create integration scripts for data migration."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Implement data synchronization with the application."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Perform testing and quality assurance of the integration."}}]}}, {{"type": "paragraph", "content": [{{"type": "text", "text": "- Document the integration process and configurations."}}]}}]}}}}'
the problem with this, at least in my case, is that the LLM sometimes returns double {{ and }}, so the JsonOutputParser
fails. My solution was to add a postprocessing of the llm output before getting into the parser
from langchain.output_parsers.json import SimpleJsonOutputParser
def postprocess_llm_output(llm_output):
return (
llm_output.content
# if no LLM, remove the `.content`
.replace("{{", "{")
.replace("}}", "}")
)
parser = SimpleJsonOutputParser()
chain = RunnableLambda(postprocess_llm_output) | parser
chain.invoke("""{{"a": 3}}"""), chain.invoke("""{"a": 3}""")
and in general
chain = prompt | llm | RunnableLambda(postprocess_llm_output) | parser
Of course, there's also RetryOutputParser
, but it needs another LLM call for something that can be fixed programmatically.
I'd like langchain to provide some solution straightaway though, because it seems like a very common use case. Please lmk if there's one already and missed it