datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

Unable to generate models for meta jsonschema

Open Skeen opened this issue 3 years ago • 7 comments

Describe the bug I expected to be able to generate Pydanic models for validating JSONSchemas themselves.

To Reproduce Fetch the json-schema meta-schema draft 7:

wget http://json-schema.org/draft-07/schema

Run datamodel-codegen:

datamodel-codegen --input schema --input-file-type jsonschema

Expected behavior I expected data-models able to parse JSONSchemas to be generated.

Version:

  • OS: [e.g. iOS] Ubuntu 20.04.2 LTS
  • Python version: Python 3.8.5
  • datamodel-code-generator version: [e.g. 22] 0.11.7

Skeen avatar Jun 11 '21 11:06 Skeen

@Skeen Thank you for creating this issue. I have confirmed the error. But, I don't understand why the error occurs. I will check it.

koxudaxi avatar Jun 11 '21 16:06 koxudaxi

@Skeen

I ran datamodel-code-generator for the schema output:

pydantic.error_wrappers.ValidationError: 6 validation errors for JsonSchemaObject
properties -> default
  value is not a valid dict (type=type_error.dict)
properties -> examples -> items
  value is not a valid list (type=type_error.list)
properties -> examples -> items
  value is not a valid dict (type=type_error.dict)
properties -> const
  value is not a valid dict (type=type_error.dict)
properties -> enum -> items
  value is not a valid list (type=type_error.list)
properties -> enum -> items
  value is not a valid dict (type=type_error.dict)

I check the detail of the schema.

        "examples": {
            "type": "array",
            "items": true
        },

I don't understand why items have true. I expect a list of the object.

What do you think about it?

koxudaxi avatar Aug 21 '21 10:08 koxudaxi

        "examples": {
            "type": "array",
            "items": true
        },

I don't understand why items have true. I expect a list of the object.

What do you think about it?

This is unexpected to me too, I'd have imagined either a type definition or a reference within an object, not just true.

Skeen avatar Aug 21 '21 12:08 Skeen

I was toying around with this issue and it seems like the main problem is that there are several fields (e.g. additionalItems, additionalProperties, items, maybe some others) which are set to true in the jsonschema draft, but the JsonSchemaObject isn't really able to take this into account. I think maybe just allowing for these fields to be Union[whatever, bool] would get through snag.

Would be nice if it could self-bootstrap the jsonschema spec itself :)

Was able to make a tiny bit of progress by amending items to allow for bool:

    items: Union[List['JsonSchemaObject'], 'JsonSchemaObject', bool, None]

This required messing with parse_list_item and I am not quite sure what it should return in that case, but I just returned [] if target_items is bool.

That just leaves the fields in draft-07.json "default": true and "const": true. That'll take a bit more digging.

...

Ok I messed around with it for a while and found these locations which seem to be amenable. Obviously I'd want to clean these up but I think I have at least found the pain points. I also need to dig into the spec to determine what the intent of these "foo": true fields is. But this looks tractable!

https://github.com/xkortex/datamodel-code-generator/tree/xkortex/447/handle_true_attributes

Also I found a bug in the way the NonNegativeIntegerDefault0 is being generated:

class NonNegativeInteger(BaseModel):
    __root__: conint(ge=0)


class NonNegativeIntegerDefault0(BaseModel):
    pass

should be more like this I believe:

class NonNegativeIntegerDefault0(NonNegativeInteger):
    __root__: conint(ge=0) = 0

xkortex avatar Jun 25 '22 00:06 xkortex

Possibly relevant: https://github.com/OAI/OpenAPI-Specification/issues/668#issue-150416089

So I think we might be able to substitute in the empty object in places where we encounter these true fields.

Also I was thinking about the whole self-hosting thing a bit more. In theory, since the json-schema describes itself, we ought to be able to compile the json-schema draft (e.g. draft-7.json) to python, then use that to parse that very same draft. The output of that parse should compile back to python. This could be a very useful way to test the veracity of the parser and generator.

In fact, that "self-compiling" python code could act as the main parser in the library. It's much, much shorter, so I wonder why there is so much manual logic in the current JsonSchemaObject. Maybe that's the bootstrapping overhead? :)

xkortex avatar Jun 25 '22 17:06 xkortex

@xkortex I was checking out that thread and it looks like, yep, the jsonschema spec treats item: true and item: {} identically. That's super interesting, and would solve my use case if taken into account.

I also think your idea about the self-describing schema is excellent; maybe worth a separate issue?


update: this seems to clear things up per the latest openapi docs:

[4.3.2](https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-02#section-4.3.2).  Boolean JSON Schemas

   The boolean schema values "true" and "false" are trivial schemas that
   always produce themselves as assertions results, regardless of the
   instance value.  They never produce annotation results.

   These boolean schemas exist to clarify schema author intent and
   facilitate schema processing optimizations.  They behave identically
   to the following schema objects (where "not" is part of the subschema
   application vocabulary defined in this document).

   true:  Always passes validation, as if the empty schema {}

   false:  Always fails validation, as if the schema { "not": {} }

   While the empty schema object is unambiguous, there are many possible
   equivalents to the "false" schema.  Using the boolean values ensures
   that the intent is clear to both human readers and implementations.

So... a "true" just means there's a subschema that can be empty, while a "false" is a subschema that cannot be empty?

rtbs-dev avatar Jul 20 '22 15:07 rtbs-dev

Also I believe this would solve #696

rtbs-dev avatar Aug 03 '22 15:08 rtbs-dev

@Skeen

I ran datamodel-code-generator for the schema output:

pydantic.error_wrappers.ValidationError: 6 validation errors for JsonSchemaObject
properties -> default
  value is not a valid dict (type=type_error.dict)
properties -> examples -> items
  value is not a valid list (type=type_error.list)
properties -> examples -> items
  value is not a valid dict (type=type_error.dict)
properties -> const
  value is not a valid dict (type=type_error.dict)
properties -> enum -> items
  value is not a valid list (type=type_error.list)
properties -> enum -> items
  value is not a valid dict (type=type_error.dict)

I check the detail of the schema.

        "examples": {
            "type": "array",
            "items": true
        },

I don't understand why items have true. I expect a list of the object.

What do you think about it?

My understanding of the v7 specification is that array with items "true" are tuples, not lists where additional items are accepted (would false otherwise) see https://json-schema.org/understanding-json-schema/reference/array.html#items under the tuple section.

sebastroy avatar Dec 02 '22 19:12 sebastroy

I'm sorry for my too-late reply. I have released the PR as 0.15.0 Thank you very much!!

koxudaxi avatar Jan 04 '23 16:01 koxudaxi

Thank you for all the good work!

sebastroy avatar Jan 05 '23 17:01 sebastroy