mcp-agent icon indicating copy to clipboard operation
mcp-agent copied to clipboard

Change approach to generate_structured

Open Nicba1010 opened this issue 8 months ago • 7 comments

The approach to implementing generate_structured seems really problematic. It has severe issues with losing the detail from the generate_str call.

When calling generate_str: Image Same query with a simple wrapper for generate_structured: Image

Current implementation:

    async def generate_structured(
        self,
        message,
        response_model: Type[ModelT],
        request_params: RequestParams | None = None,
    ) -> ModelT:
        # First we invoke the LLM to generate a string response
        # We need to do this in a two-step process because Instructor doesn't
        # know how to invoke MCP tools via call_tool, so we'll handle all the
        # processing first and then pass the final response through Instructor
        response = await self.generate_str(
            message=message,
            request_params=request_params,
        )

        # Next we pass the text through instructor to extract structured data
        client = instructor.from_openai(
            OpenAI(api_key=self.context.config.openai.api_key, base_url=self.context.config.openai.base_url),
            mode=instructor.Mode.TOOLS_STRICT,
        )

        params = self.get_request_params(request_params)
        model = await self.select_model(params)

        # Extract structured data from natural language
        structured_response = client.chat.completions.create(
            model=model or "gpt-4o",
            response_model=response_model,
            messages=[
                {"role": "user", "content": response},
            ],
        )

        return structured_response

Nicba1010 avatar Mar 13 '25 19:03 Nicba1010

And this is fairly consistent, to get worse responses through generate_structured

Nicba1010 avatar Mar 13 '25 19:03 Nicba1010

Thanks for raising this @Nicba1010. I have also been meaning to suggest updating these.

I think the instructor library is pretty standard, but both OpenAI and Anthropic have native solutions for these that could potentially be used instead:

  • https://platform.openai.com/docs/guides/structured-outputs?api-mode=chat
  • https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-output

mattmorgis avatar Mar 13 '25 20:03 mattmorgis

Thanks for raising this @Nicba1010. I have also been meaning to suggest updating these.

I think the instructor library is pretty standard, but both OpenAI and Anthropic have native solutions for these that could potentially be used instead:

  • https://platform.openai.com/docs/guides/structured-outputs?api-mode=chat
  • https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-output

I believe @evalstate has suggested the same thing. This is a good change to make. Any takers? :). Otherwise I'll take it next week.

saqadri avatar Mar 13 '25 21:03 saqadri

Yep, I've seen the same with Instructor, it's doubling up calls and generating results inconsistent with the other call. For the places where it is used I don't think the complexity warrants it. @saqadri in the fast-agent branch i've also moved to XML tags for those areas. Also of note as it's a related area, Sonnet 3.7 over-plans and over-evaluates - so model specific autogenerated prompts/rules may be necessary for a good experience....

evalstate avatar Mar 13 '25 21:03 evalstate

btw, i've already removed it for anthropic (based on the current outputs, i'm not worried about using the 'formal' structured outputs just yet) and can remove for OAI this morning. this also allows a bump up of the anthropic sdk which instructor has forced pinned to an earlier verison.

evalstate avatar Mar 14 '25 08:03 evalstate

OK, this is now merged to main in fast-agent and pushed in 0.1.8. quick summary:

  • Instructor and dependencies removed; packages updated.
  • OpenAI endpoint now uses the chat completions parse method to generate structured content.
  • Anthropic API uses Pydantic with partial JSON parsing.
  • Orchestrator, Eval/Opt and Router now all use these new methods natively.

The performance increase is..... yeah, massive. You definitely want this. Actual performance of the orchestrator in both modes also seems better.

Caveats/future TODOs:

  • Supply structured schema as tool definition (and specify tool) to Anthropic endpoint. I haven't done this: the prompting seems to be more than adequate for both Haiku and Sonnet and the generated prompts require very simple JSON.
  • Prompt Engineering for the autogenerated Router/Orchestrator/Eval-Optimizer prompts.
  • Have the prompted JSON format be generated using the Pydantic type rather than hard-coded.
  • Improve the Console Display of Workflow Agents (1) -> use the types to display clearly what is happening rather than leave it to the User to peer through XML/JSON.
  • Improve the Console Display of Workflow Agents (2) -> Truncate concatenated agent results, apply minor indentation to XML to make it easier to eyeball the overall content.
  • I've tested pretty extensively, but there's always edge cases. There was a bit of LLM generated code that was overly defensive that I've taken out etc.

Other things in mind:

  • Simplify the type hierarchy for Agents/AugmentedLLMs. These workflow components are just Agents really, and e.g. router_llm doesn't derive from Augmented, Orchestrator does.
  • For fast-agent, I am moving towards using MCP Types internally for the interfaces, and will be revisiting the use of Structured Outputs at the User-facing API level.

evalstate avatar Mar 15 '25 11:03 evalstate

OK, this is now merged to main in fast-agent and pushed in 0.1.8. quick summary:

  • Instructor and dependencies removed; packages updated.
  • OpenAI endpoint now uses the chat completions parse method to generate structured content.
  • Anthropic API uses Pydantic with partial JSON parsing.
  • Orchestrator, Eval/Opt and Router now all use these new methods natively.

The performance increase is..... yeah, massive. You definitely want this. Actual performance of the orchestrator in both modes also seems better.

Caveats/future TODOs:

  • Supply structured schema as tool definition (and specify tool) to Anthropic endpoint. I haven't done this: the prompting seems to be more than adequate for both Haiku and Sonnet and the generated prompts require very simple JSON.
  • Prompt Engineering for the autogenerated Router/Orchestrator/Eval-Optimizer prompts.
  • Have the prompted JSON format be generated using the Pydantic type rather than hard-coded.
  • Improve the Console Display of Workflow Agents (1) -> use the types to display clearly what is happening rather than leave it to the User to peer through XML/JSON.
  • Improve the Console Display of Workflow Agents (2) -> Truncate concatenated agent results, apply minor indentation to XML to make it easier to eyeball the overall content.
  • I've tested pretty extensively, but there's always edge cases. There was a bit of LLM generated code that was overly defensive that I've taken out etc.

Other things in mind:

  • Simplify the type hierarchy for Agents/AugmentedLLMs. These workflow components are just Agents really, and e.g. router_llm doesn't derive from Augmented, Orchestrator does.
  • For fast-agent, I am moving towards using MCP Types internally for the interfaces, and will be revisiting the use of Structured Outputs at the User-facing API level.

@evalstate would it be possible to create a targeted PR for that specifically into mcp_agent? I can also do it. Right now my concern is there is so much change in fastagent that it is making it hard to merge a lot of really great stuff in.

saqadri avatar Mar 16 '25 03:03 saqadri