Use jsonref to expand StructuredTool.args
With a nested args_schema, StructuredTool.args will currently only return the properties of the top level object without either including or resolving JSON Schema references, e.g.:
> structured_tool.args
{
"table_id": {
"title": "Table Id",
"type": "string"
},
"body": {
"title": "Body",
"description": "an API query in the required format",
"allOf": [
{
"$ref": "#/definitions/ApiQuery"
}
]
}
}
This PR implements the second of three possible solutions I see:
1. Include the full schema with references (i.e. not just top level properties):
{
"title": "TableQuery",
"type": "object",
"properties": {
"table_id": {
"title": "Table Id",
"type": "string"
},
"body": {
"title": "Body",
"description": "an API query in the required format",
"allOf": [
{
"$ref": "#/definitions/ApiQuery"
}
]
}
},
"required": [
"table_id",
"body"
],
"definitions": {
"ApiQuery": {
"title": "ApiQuery",
"type": "object",
"properties": { ... }
},
}
}
- Pros: Full schema is included with less token increase
- Cons: LLM may have a hard time understanding the references
2. Resolve (expand) the references using the jsonref package:
{
"table_id": {
"title": "Table Id",
"type": "string"
},
"body": {
"title": "Body",
"description": "an API query in the required format",
"allOf": [
{
"title": "ApiQuery",
"type": "object",
"properties": {
...
}
}
]
}
}
- Pros: Includes the full schema in a way that the LLM might understand
- Cons: Token count bloat + adds new package dependency
3. Warn or raise
A third option would be to raise an exception that the schema is too complicated and ask the user to override StructuredTool.args. This would have helped me a lot in debugging my agent. Also has the benefit of not breaking existing agents relying on the old behaviour.
Who can review?
@hwchase17 @vowelparrot
what would the logic for (3) be exactly?
Pragmatically:
if "$ref" in self.args_schema.schema_json():
# raise or warn
More robustly we could recurse through the schema and look for key == "$ref" and value.startswith("#/definitions")
could we do that logic, and if we see that then use jsonref instead of erroring? i dont love making jsonref a strict requirement but seems like a solid fallback for these edgecases?
We could also return ["properties"] and ["definitions"] without dereferencing
We could also return ["properties"] and ["definitions"] without dereferencing
@hwchase17 I've implemented your suggestion of falling back to dereferencing with jsonref only if the schema includes $ref. Please check if I did it correctly regarding the optional dependency.
@vowelparrot Indeed – not sure if it would make the schema harder to understand for LLMs. The dereferencing at least appears to work very well for my use case.
Anecdotally Davinci 3 and GPT-4 both handle reduced but not dereferenced specs nicely, but I'd assume most other models do not
@jarib Hi , could you, please, resolve the merging issues and address the last comments (if needed)? After that, ping me and I push this PR for the review. Thanks!
Closing because the PR wouldn't line up with the current directory structure of the library (would need to be in /libs/langchain/langchain instead of /langchain). Feel free to reopen against the current head if it's still relevant!
I have a problem with prompt, it's don't have any definitions in prompt request to LLM. Could you please help me about it?