haystack fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility

Related Issues

fixes https://github.com/deepset-ai/haystack/issues/7457 https://github.com/deepset-ai/haystack/issues/7455

Proposed Changes:

Claude Compatibility: modified the behaviour so that (i) error template is now a single message with generated json, error and schema (ii) make it so that validated messages are always "Assistant" chatmessage (for next pipeline step) and validation_errors are always "User" chatmessage (for LLM loops)
Recursive Loop in type conversion: used Claude OPUS to automatically generate a fix based on the written issue.

How did you test it?

Tested on my personal use-case and it solved my issues.

Notes for the reviewer

The behaviour is modified to only include the last messages from the conversation and not the whole history of messages (less cost for long pipeline loops, not necessary to have previous messages).

For the auto-generated fix for recursive, maybe the bug comes from the fact that sometimes json.loads(value) output a string and needs to be called twice to get the actual dict/list in the string. This is weird, but I've seen it happen. I'm not sure about the fundamental difference to be honest. Maybe it doesn't work for nested json.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

Apr 17 '24 14:04 lambda-science

Looks good @lambda-science , would you please add a short reno note (see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md) and resolve these black issues :-)

Apr 22 '24 07:04 vblagoje

Is this still relevant? Let's merge it or close.

May 15 '24 19:05 masci

It's missing reno note and unit tests. It's an important addition and would love to see @lambda-science push it towards the finish line 🏁

May 16 '24 08:05 vblagoje

Sorry, it went out of my mind I will do it :)

May 16 '24 08:05 lambda-science

I know why I stopped, because I had issue setting up the env (on windows). Now that all is set, I can see there was test failling (on top of black/reno missing) so I will work on it

May 16 '24 09:05 lambda-science

All committers have signed the CLA.

May 16 '24 09:05 CLAassistant

Should be good now. I had to change the test a bit because as I explained I suggested to only validate latest message (and include only latest message for validation) to optimize cost of long loops ! Tell me if you agree or not. (So validation of multi-message history only return a list of 1 message)

May 16 '24 10:05 lambda-science

Pull Request Test Coverage Report for Build 9610907174

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
52 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.2%) to 89.953%

Files with Coverage Reduction	New Missed Lines	%
core/component/component.py	2	98.28%
components/validators/json_schema.py	50	0.0%
<!--	Total:	52

Totals
Change from base Build 9600865720:	-0.2%
Covered Lines:	6912
Relevant Lines:	7684

💛 - Coveralls

May 16 '24 14:05 coveralls

Verified manually to work for OpenAI, Anthropic, and Cohere. The tests were:

OpenAI:

   import json
   from typing import List
    
   from haystack import Pipeline
   from haystack.components.converters import OutputAdapter
   from haystack.components.generators.chat import OpenAIChatGenerator
   from haystack.components.joiners import BranchJoiner
   from haystack.components.validators import JsonSchemaValidator
   from haystack.dataclasses import ChatMessage
    
   person_schema = {
       "type": "object",
       "properties": {
           "first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
       },
       "required": ["first_name", "last_name", "nationality"]
   }
    
   # Initialize a pipeline
   pipe = Pipeline()
    
   # Add components to the pipeline
   pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
   pipe.add_component('fc_llm', OpenAIChatGenerator(model="gpt-4o"))
   pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
   pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
   # And connect them
   pipe.connect("adapter", "joiner")
   pipe.connect("joiner", "fc_llm")
   pipe.connect("fc_llm.replies", "validator.messages")
   pipe.connect("validator.validation_error", "joiner")
    
   result = pipe.run(data={"adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
    
   print(json.loads(result["validator"]["validated"][0].content))

The output was:

{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American', 'alias': 'Spider-Man', 'occupation': 'Photographer', 'affiliations': ['Daily Bugle', 'Avengers'], 'abilities': ['Superhuman strength', 'Enhanced agility', 'Spider-sense', 'Ability to cling to surfaces', 'Web-shooting'], 'personal_info': {'age': 25, 'gender': 'Male', 'height': '5\'10"', 'weight': '167 lbs', 'eye_color': 'Hazel', 'hair_color': 'Brown'}, 'biography': {'origin': 'Bitten by a radioactive spider, high school student Peter Parker gained the speed, strength and powers of a spider.', 'uncle_ben_quote': 'With great power comes great responsibility.'}, 'relationships': {'aunt': 'May Parker', 'girlfriend': 'Mary Jane Watson', 'best_friend': 'Harry Osborn'}}

Anthropic:

   import json  
   from typing import List
    
   from haystack import Pipeline
   from haystack.components.converters import OutputAdapter
   from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
   from haystack.components.joiners import BranchJoiner
   from haystack.components.validators import JsonSchemaValidator
   from haystack.dataclasses import ChatMessage
    
   person_schema = {
       "type": "object",
       "properties": {
           "first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
       },
       "required": ["first_name", "last_name", "nationality"]
   }
    
   # Initialize a pipeline
   pipe = Pipeline()
    
   # Add components to the pipeline
   pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
   pipe.add_component('fc_llm', AnthropicChatGenerator(model="claude-3-5-sonnet-20240620"))
   pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
   pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
   # And connect them
   pipe.connect("adapter", "joiner")
   pipe.connect("joiner", "fc_llm")
   pipe.connect("fc_llm.replies", "validator.messages")
   pipe.connect("validator.validation_error", "joiner")
    
   result = pipe.run(data={
                           "adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
    
   print(json.loads(result["validator"]["validated"][0].content))

The output was:

{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American'}

And finally Cohere:

   import json
   from typing import List
    
   from haystack import Pipeline
   from haystack.components.converters import OutputAdapter
   from haystack_integrations.components.generators.cohere import CohereChatGenerator
   from haystack.components.joiners import BranchJoiner
   from haystack.components.validators import JsonSchemaValidator
   from haystack.dataclasses import ChatMessage
    
   person_schema = {
       "type": "object",
       "properties": {
           "first_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "last_name": {"type": "string", "pattern": "^[A-Z][a-z]+$"},
           "nationality": {"type": "string", "enum": ["Italian", "Portuguese", "American"]},
       },
       "required": ["first_name", "last_name", "nationality"]
   }
    
   # Initialize a pipeline
   pipe = Pipeline()
    
   # Add components to the pipeline
   pipe.add_component('joiner', BranchJoiner(List[ChatMessage]))
   pipe.add_component('fc_llm', CohereChatGenerator(model="command-r"))
   pipe.add_component('validator', JsonSchemaValidator(json_schema=person_schema))
   pipe.add_component('adapter', OutputAdapter("{{chat_message}}", List[ChatMessage])),
   # And connect them
   pipe.connect("adapter", "joiner")
   pipe.connect("joiner", "fc_llm")
   pipe.connect("fc_llm.replies", "validator.messages")
   pipe.connect("validator.validation_error", "joiner")
    
   result = pipe.run(data={"adapter": {"chat_message": [ChatMessage.from_user("Create json from Peter Parker")]}})
    
   print(json.loads(result["validator"]["validated"][0].content))

The output was:

{'first_name': 'Peter', 'last_name': 'Parker', 'nationality': 'American'}

Jun 21 '24 08:06 vblagoje

haystack haystack copied to clipboard

fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Pull Request Test Coverage Report for Build 9610907174

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

haystack
haystack copied to clipboard