langchain Use jsonref to expand StructuredTool.args

With a nested args_schema, StructuredTool.args will currently only return the properties of the top level object without either including or resolving JSON Schema references, e.g.:

> structured_tool.args
{
  "table_id": {
    "title": "Table Id",
    "type": "string"
  },
  "body": {
    "title": "Body",
    "description": "an API query in the required format",
    "allOf": [
      {
        "$ref": "#/definitions/ApiQuery"
      }
    ]
  }
}

This PR implements the second of three possible solutions I see:

1. Include the full schema with references (i.e. not just top level `properties`):

{
  "title": "TableQuery",
  "type": "object",
  "properties": {
    "table_id": {
      "title": "Table Id",
      "type": "string"
    },
    "body": {
      "title": "Body",
      "description": "an API query in the required format",
      "allOf": [
        {
          "$ref": "#/definitions/ApiQuery"
        }
      ]
    }
  },
  "required": [
    "table_id",
    "body"
  ],
  "definitions": {
    "ApiQuery": {
      "title": "ApiQuery",
      "type": "object",
      "properties": { ... }
     }, 
  }
}

Pros: Full schema is included with less token increase
Cons: LLM may have a hard time understanding the references

2. Resolve (expand) the references using the `jsonref` package:

{
  "table_id": {
    "title": "Table Id",
    "type": "string"
  },
  "body": {
    "title": "Body",
    "description": "an API query in the required format",
    "allOf": [
      {
        "title": "ApiQuery",
        "type": "object",
        "properties": {
            ...
        }
      }
    ]
  }
}

Pros: Includes the full schema in a way that the LLM might understand
Cons: Token count bloat + adds new package dependency

3. Warn or raise

A third option would be to raise an exception that the schema is too complicated and ask the user to override StructuredTool.args. This would have helped me a lot in debugging my agent. Also has the benefit of not breaking existing agents relying on the old behaviour.

Who can review?

@hwchase17 @vowelparrot

May 20 '23 13:05 jarib

what would the logic for (3) be exactly?

May 20 '23 14:05 hwchase17

Pragmatically:

if "$ref" in self.args_schema.schema_json():
  # raise or warn

More robustly we could recurse through the schema and look for key == "$ref" and value.startswith("#/definitions")

May 20 '23 14:05 jarib

could we do that logic, and if we see that then use jsonref instead of erroring? i dont love making jsonref a strict requirement but seems like a solid fallback for these edgecases?

May 21 '23 05:05 hwchase17

We could also return ["properties"] and ["definitions"] without dereferencing

May 22 '23 02:05 vowelparrot

We could also return ["properties"] and ["definitions"] without dereferencing

May 22 '23 02:05 vowelparrot

@hwchase17 I've implemented your suggestion of falling back to dereferencing with jsonref only if the schema includes $ref. Please check if I did it correctly regarding the optional dependency.

@vowelparrot Indeed – not sure if it would make the schema harder to understand for LLMs. The dereferencing at least appears to work very well for my use case.

May 22 '23 08:05 jarib

Anecdotally Davinci 3 and GPT-4 both handle reduced but not dereferenced specs nicely, but I'd assume most other models do not

May 22 '23 15:05 vowelparrot

@jarib Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!

Sep 13 '23 20:09 leo-gan

Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!

Nov 07 '23 04:11 efriis

Screenshot 2024-02-24 at 11 31 33

I have a problem with prompt, it's don't have any definitions in prompt request to LLM. Could you please help me about it?

Feb 24 '24 04:02 eav-solution

Use jsonref to expand StructuredTool.args

1. Include the full schema with references (i.e. not just top level properties):

2. Resolve (expand) the references using the jsonref package:

3. Warn or raise

Who can review?

1. Include the full schema with references (i.e. not just top level `properties`):

2. Resolve (expand) the references using the `jsonref` package: