datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

I'm probably doing something wrong....

Open dbolser opened this issue 1 year ago • 4 comments

The JSON Schema is here: https://www.encodeproject.org/profiles/experiment#raw

I'm calling: datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel

The model is loaded perfectly by pydantic, but my specific data fails to validate:

$ python validate.py 
Validation failed!
165 validation errors for Experiment
biosample_ontology
  Input should be a valid string [type=string_type, input_value={'status': 'released', 's... 'K-562', 'K-562 cell']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.0
  Input should be a valid string [type=string_type, input_value={'documents': ['/document...'ENCODE4 v1.2.1 GRCh38'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.1
  Input should be a valid string [type=string_type, input_value={'documents': ['/document...ENCODE4 v1.15.0 GRCh38'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
...

Here is the code of my simple validator script:

import json
from pydantic import ValidationError

from models.experiment import Experiment

def validate_json(json_data):
    try:
        Experiment(**json_data)
        print("Validation successful!")
    except ValidationError as e:
        print("Validation failed!")
        print(e)

if __name__ == "__main__":
    # Load JSON data to be validated
    with open('experiment-ENCSR545YBD.json', 'r') as f:
        data_json = json.load(f)
    
    # Validate
    validate_json(data_json)

The experiment JSON can be downloaded here: https://www.encodeproject.org/experiments/ENCSR545YBD/?format=json

I guess it's to do with the complex data types not being strings, and being represented as $refs, but I don't know what I@m doing 😅

dbolser avatar Sep 27 '23 14:09 dbolser

I have the feeling that I need to cross reference several other objects ... somehow...

dbolser avatar Sep 27 '23 14:09 dbolser

Thank you for creating the issue. I have checked the data and the schema. I think biosample_ontology should be string. But, the actual type is an object?

{
    "title": "Biosample ontology",
    "description": "An embeded property for linking to biosample type which describes the ontology of the biosample.",
    "comment": "See biosample_type.json for available identifiers.",
    "type": "string",
    "linkTo": "BiosampleType"
}
 "biosample_ontology": {"status": "released", 

koxudaxi avatar Oct 04 '23 16:10 koxudaxi

Thank you for looking at this!

Yes.

They have this weird API mode 'qualifier' (called frame¹) that returns valid (according to the schema) JSON, but I don't understand how you're supposed to 'follow the refs' from, for example, the biosample_ontology string to the full underlying object, specified by the biosample_type² schema document...

¹ frame=object will always give you all of the properties, with embedded objects referred to by an identifier. https://www.encodeproject.org/help/rest-api/#search-features

² https://www.encodeproject.org/profiles/biosample_type

Is there some sort of 'lazy load' API xref that can be assigned to that field by the schema?

What's the API definition language?

Actually their documentation shows an example, I'm just not sure how to match the embedded objects referred to by an identifier with the right schema, and then put that in the model as a lazy load...

Anyway, thanks again for your help.

Cheers, Dan

On Wed, 4 Oct 2023, 5:47 pm Koudai Aono, @.***> wrote:

Thank you for creating the issue. I have checked the data and the schema. I think biosample_ontology should be string. But, the actual type is an object?

{ "title": "Biosample ontology", "description": "An embeded property for linking to biosample type which describes the ontology of the biosample.", "comment": "See biosample_type.json for available identifiers.", "type": "string", "linkTo": "BiosampleType" }

"biosample_ontology": {"status": "released",

— Reply to this email directly, view it on GitHub https://github.com/koxudaxi/datamodel-code-generator/issues/1577#issuecomment-1747281211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA7NEZQEWLN7RPCRSSU2WTX5WHJRAVCNFSM6AAAAAA5JPDR6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGI4DCMRRGE . You are receiving this because you authored the thread. Message ID: <koxudaxi/datamodel-code-generator/issues/1577/1747281211@ github.com>

dbolser avatar Oct 04 '23 21:10 dbolser

BTW, Do I need to worry about these warnings?

$ datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel
/me/.venv/lib/python3.10/site-packages/datamodel_code_generator/parser/jsonschema.py:334: UserWarning: format of 'accession' not understood for 'string' - using default
  warn(f'format of {format__!r} not understood for {type_!r} - using default' '')

dbolser avatar Oct 05 '23 15:10 dbolser