haystack
haystack copied to clipboard
fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility
Related Issues
- fixes https://github.com/deepset-ai/haystack/issues/7457 https://github.com/deepset-ai/haystack/issues/7455
Proposed Changes:
- Claude Compatibility: modified the behaviour so that (i) error template is now a single message with generated json, error and schema (ii) make it so that validated messages are always "Assistant" chatmessage (for next pipeline step) and validation_errors are always "User" chatmessage (for LLM loops)
- Recursive Loop in type conversion: used Claude OPUS to automatically generate a fix based on the written issue.
How did you test it?
Tested on my personal use-case and it solved my issues.
Notes for the reviewer
The behaviour is modified to only include the last messages from the conversation and not the whole history of messages (less cost for long pipeline loops, not necessary to have previous messages).
For the auto-generated fix for recursive, maybe the bug comes from the fact that sometimes json.loads(value)
output a string and needs to be called twice to get the actual dict/list in the string. This is weird, but I've seen it happen. I'm not sure about the fundamental difference to be honest. Maybe it doesn't work for nested json.
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. - I documented my code
- I ran pre-commit hooks and fixed any issue
Looks good @lambda-science , would you please add a short reno note (see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md) and resolve these black issues :-)
Is this still relevant? Let's merge it or close.
It's missing reno note and unit tests. It's an important addition and would love to see @lambda-science push it towards the finish line 🏁
Sorry, it went out of my mind I will do it :)
I know why I stopped, because I had issue setting up the env (on windows). Now that all is set, I can see there was test failling (on top of black/reno missing) so I will work on it
Should be good now. I had to change the test a bit because as I explained I suggested to only validate latest message (and include only latest message for validation) to optimize cost of long loops ! Tell me if you agree or not. (So validation of multi-message history only return a list of 1 message)
Pull Request Test Coverage Report for Build 9610907174
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 52 unchanged lines in 2 files lost coverage.
- Overall coverage decreased (-0.2%) to 89.953%
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
core/component/component.py | 2 | 98.28% |
components/validators/json_schema.py | 50 | 0.0% |
<!-- | Total: | 52 |
Totals | |
---|---|
Change from base Build 9600865720: | -0.2% |
Covered Lines: | 6912 |
Relevant Lines: | 7684 |
💛 - Coveralls
Verified manually to work for OpenAI, Anthropic, and Cohere. The tests were:
OpenAI:
import json
from typing import List
from haystack import Pipeline
from haystack.components.converters import OutputAdapter
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.joiners import BranchJoiner
from haystack.components.validators import JsonSchemaValidator
from haystack.dataclasses import ChatMessage
person_schema = {
"type": "object",
"properties": {
"first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
},
"required": ["first_name", "last_name", "nationality"]
}
# Initialize a pipeline
pipe = Pipeline()
# Add components to the pipeline
pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
pipe.add_component('fc_llm', OpenAIChatGenerator(model="gpt-4o"))
pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
# And connect them
pipe.connect("adapter", "joiner")
pipe.connect("joiner", "fc_llm")
pipe.connect("fc_llm.replies", "validator.messages")
pipe.connect("validator.validation_error", "joiner")
result = pipe.run(data={"adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
print(json.loads(result["validator"]["validated"][0].content))
The output was:
{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American', 'alias': 'Spider-Man', 'occupation': 'Photographer', 'affiliations': ['Daily Bugle', 'Avengers'], 'abilities': ['Superhuman strength', 'Enhanced agility', 'Spider-sense', 'Ability to cling to surfaces', 'Web-shooting'], 'personal_info': {'age': 25, 'gender': 'Male', 'height': '5\'10"', 'weight': '167 lbs', 'eye_color': 'Hazel', 'hair_color': 'Brown'}, 'biography': {'origin': 'Bitten by a radioactive spider, high school student Peter Parker gained the speed, strength and powers of a spider.', 'uncle_ben_quote': 'With great power comes great responsibility.'}, 'relationships': {'aunt': 'May Parker', 'girlfriend': 'Mary Jane Watson', 'best_friend': 'Harry Osborn'}}
Anthropic:
import json
from typing import List
from haystack import Pipeline
from haystack.components.converters import OutputAdapter
from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
from haystack.components.joiners import BranchJoiner
from haystack.components.validators import JsonSchemaValidator
from haystack.dataclasses import ChatMessage
person_schema = {
"type": "object",
"properties": {
"first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
},
"required": ["first_name", "last_name", "nationality"]
}
# Initialize a pipeline
pipe = Pipeline()
# Add components to the pipeline
pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
pipe.add_component('fc_llm', AnthropicChatGenerator(model="claude-3-5-sonnet-20240620"))
pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
# And connect them
pipe.connect("adapter", "joiner")
pipe.connect("joiner", "fc_llm")
pipe.connect("fc_llm.replies", "validator.messages")
pipe.connect("validator.validation_error", "joiner")
result = pipe.run(data={
"adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
print(json.loads(result["validator"]["validated"][0].content))
The output was:
{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American'}
And finally Cohere:
import json
from typing import List
from haystack import Pipeline
from haystack.components.converters import OutputAdapter
from haystack_integrations.components.generators.cohere import CohereChatGenerator
from haystack.components.joiners import BranchJoiner
from haystack.components.validators import JsonSchemaValidator
from haystack.dataclasses import ChatMessage
person_schema = {
"type": "object",
"properties": {
"first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
"nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
},
"required": ["first_name", "last_name", "nationality"]
}
# Initialize a pipeline
pipe = Pipeline()
# Add components to the pipeline
pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
pipe.add_component('fc_llm', CohereChatGenerator(model="command-r"))
pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
# And connect them
pipe.connect("adapter", "joiner")
pipe.connect("joiner", "fc_llm")
pipe.connect("fc_llm.replies", "validator.messages")
pipe.connect("validator.validation_error", "joiner")
result = pipe.run(data={"adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
print(json.loads(result["validator"]["validated"][0].content))
The output was:
{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American'}