datamodel-code-generator
datamodel-code-generator copied to clipboard
Unable to generate models for meta jsonschema
Describe the bug I expected to be able to generate Pydanic models for validating JSONSchemas themselves.
To Reproduce Fetch the json-schema meta-schema draft 7:
wget http://json-schema.org/draft-07/schema
Run datamodel-codegen:
datamodel-codegen --input schema --input-file-type jsonschema
Expected behavior I expected data-models able to parse JSONSchemas to be generated.
Version:
- OS: [e.g. iOS] Ubuntu 20.04.2 LTS
- Python version: Python 3.8.5
- datamodel-code-generator version: [e.g. 22] 0.11.7
@Skeen Thank you for creating this issue. I have confirmed the error. But, I don't understand why the error occurs. I will check it.
@Skeen
I ran datamodel-code-generator for the schema output:
pydantic.error_wrappers.ValidationError: 6 validation errors for JsonSchemaObject
properties -> default
value is not a valid dict (type=type_error.dict)
properties -> examples -> items
value is not a valid list (type=type_error.list)
properties -> examples -> items
value is not a valid dict (type=type_error.dict)
properties -> const
value is not a valid dict (type=type_error.dict)
properties -> enum -> items
value is not a valid list (type=type_error.list)
properties -> enum -> items
value is not a valid dict (type=type_error.dict)
I check the detail of the schema.
"examples": {
"type": "array",
"items": true
},
I don't understand why items
have true
. I expect a list of the object.
What do you think about it?
"examples": { "type": "array", "items": true },
I don't understand why
items
havetrue
. I expect a list of the object.What do you think about it?
This is unexpected to me too, I'd have imagined either a type definition or a reference within an object, not just true
.
I was toying around with this issue and it seems like the main problem is that there are several fields (e.g. additionalItems
, additionalProperties
, items
, maybe some others) which are set to true
in the jsonschema draft, but the JsonSchemaObject
isn't really able to take this into account. I think maybe just allowing for these fields to be Union[whatever, bool]
would get through snag.
Would be nice if it could self-bootstrap the jsonschema spec itself :)
Was able to make a tiny bit of progress by amending items to allow for bool:
items: Union[List['JsonSchemaObject'], 'JsonSchemaObject', bool, None]
This required messing with parse_list_item
and I am not quite sure what it should return in that case, but I just returned []
if target_items is bool.
That just leaves the fields in draft-07.json "default": true
and "const": true
. That'll take a bit more digging.
...
Ok I messed around with it for a while and found these locations which seem to be amenable. Obviously I'd want to clean these up but I think I have at least found the pain points. I also need to dig into the spec to determine what the intent of these "foo": true
fields is. But this looks tractable!
https://github.com/xkortex/datamodel-code-generator/tree/xkortex/447/handle_true_attributes
Also I found a bug in the way the NonNegativeIntegerDefault0 is being generated:
class NonNegativeInteger(BaseModel):
__root__: conint(ge=0)
class NonNegativeIntegerDefault0(BaseModel):
pass
should be more like this I believe:
class NonNegativeIntegerDefault0(NonNegativeInteger):
__root__: conint(ge=0) = 0
Possibly relevant: https://github.com/OAI/OpenAPI-Specification/issues/668#issue-150416089
So I think we might be able to substitute in the empty object in places where we encounter these true
fields.
Also I was thinking about the whole self-hosting thing a bit more. In theory, since the json-schema describes itself, we ought to be able to compile the json-schema draft (e.g. draft-7.json) to python, then use that to parse that very same draft. The output of that parse should compile back to python. This could be a very useful way to test the veracity of the parser and generator.
In fact, that "self-compiling" python code could act as the main parser in the library. It's much, much shorter, so I wonder why there is so much manual logic in the current JsonSchemaObject. Maybe that's the bootstrapping overhead? :)
@xkortex I was checking out that thread and it looks like, yep, the jsonschema spec treats item: true
and item: {}
identically. That's super interesting, and would solve my use case if taken into account.
I also think your idea about the self-describing schema is excellent; maybe worth a separate issue?
update: this seems to clear things up per the latest openapi docs:
[4.3.2](https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-02#section-4.3.2). Boolean JSON Schemas
The boolean schema values "true" and "false" are trivial schemas that
always produce themselves as assertions results, regardless of the
instance value. They never produce annotation results.
These boolean schemas exist to clarify schema author intent and
facilitate schema processing optimizations. They behave identically
to the following schema objects (where "not" is part of the subschema
application vocabulary defined in this document).
true: Always passes validation, as if the empty schema {}
false: Always fails validation, as if the schema { "not": {} }
While the empty schema object is unambiguous, there are many possible
equivalents to the "false" schema. Using the boolean values ensures
that the intent is clear to both human readers and implementations.
So... a "true" just means there's a subschema that can be empty, while a "false" is a subschema that cannot be empty?
Also I believe this would solve #696
@Skeen
I ran datamodel-code-generator for the schema output:
pydantic.error_wrappers.ValidationError: 6 validation errors for JsonSchemaObject properties -> default value is not a valid dict (type=type_error.dict) properties -> examples -> items value is not a valid list (type=type_error.list) properties -> examples -> items value is not a valid dict (type=type_error.dict) properties -> const value is not a valid dict (type=type_error.dict) properties -> enum -> items value is not a valid list (type=type_error.list) properties -> enum -> items value is not a valid dict (type=type_error.dict)
I check the detail of the schema.
"examples": { "type": "array", "items": true },
I don't understand why
items
havetrue
. I expect a list of the object.What do you think about it?
My understanding of the v7 specification is that array with items "true" are tuples, not lists where additional items are accepted (would false otherwise) see https://json-schema.org/understanding-json-schema/reference/array.html#items under the tuple section.
I'm sorry for my too-late reply.
I have released the PR as 0.15.0
Thank you very much!!
Thank you for all the good work!