datamodel-code-generator
datamodel-code-generator copied to clipboard
I'm probably doing something wrong....
The JSON Schema is here: https://www.encodeproject.org/profiles/experiment#raw
I'm calling:
datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel
The model is loaded perfectly by pydantic, but my specific data fails to validate:
$ python validate.py
Validation failed!
165 validation errors for Experiment
biosample_ontology
Input should be a valid string [type=string_type, input_value={'status': 'released', 's... 'K-562', 'K-562 cell']}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.0
Input should be a valid string [type=string_type, input_value={'documents': ['/document...'ENCODE4 v1.2.1 GRCh38'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.1
Input should be a valid string [type=string_type, input_value={'documents': ['/document...ENCODE4 v1.15.0 GRCh38'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.4/v/string_type
...
Here is the code of my simple validator script:
import json
from pydantic import ValidationError
from models.experiment import Experiment
def validate_json(json_data):
try:
Experiment(**json_data)
print("Validation successful!")
except ValidationError as e:
print("Validation failed!")
print(e)
if __name__ == "__main__":
# Load JSON data to be validated
with open('experiment-ENCSR545YBD.json', 'r') as f:
data_json = json.load(f)
# Validate
validate_json(data_json)
The experiment JSON can be downloaded here: https://www.encodeproject.org/experiments/ENCSR545YBD/?format=json
I guess it's to do with the complex data types not being strings, and being represented as $refs, but I don't know what I@m doing 😅
I have the feeling that I need to cross reference several other objects ... somehow...
Thank you for creating the issue.
I have checked the data and the schema.
I think biosample_ontology
should be string
. But, the actual type is an object?
{
"title": "Biosample ontology",
"description": "An embeded property for linking to biosample type which describes the ontology of the biosample.",
"comment": "See biosample_type.json for available identifiers.",
"type": "string",
"linkTo": "BiosampleType"
}
"biosample_ontology": {"status": "released",
Thank you for looking at this!
Yes.
They have this weird API mode 'qualifier' (called frame¹) that returns valid (according to the schema) JSON, but I don't understand how you're supposed to 'follow the refs' from, for example, the biosample_ontology string to the full underlying object, specified by the biosample_type² schema document...
¹ frame=object
will always give you all of the properties, with embedded objects referred to by an identifier. https://www.encodeproject.org/help/rest-api/#search-features
² https://www.encodeproject.org/profiles/biosample_type
Is there some sort of 'lazy load' API xref that can be assigned to that field by the schema?
What's the API definition language?
Actually their documentation shows an example, I'm just not sure how to match the embedded objects referred to by an identifier with the right schema, and then put that in the model as a lazy load...
Anyway, thanks again for your help.
Cheers, Dan
On Wed, 4 Oct 2023, 5:47 pm Koudai Aono, @.***> wrote:
Thank you for creating the issue. I have checked the data and the schema. I think biosample_ontology should be string. But, the actual type is an object?
{ "title": "Biosample ontology", "description": "An embeded property for linking to biosample type which describes the ontology of the biosample.", "comment": "See biosample_type.json for available identifiers.", "type": "string", "linkTo": "BiosampleType" }
"biosample_ontology": {"status": "released",
— Reply to this email directly, view it on GitHub https://github.com/koxudaxi/datamodel-code-generator/issues/1577#issuecomment-1747281211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA7NEZQEWLN7RPCRSSU2WTX5WHJRAVCNFSM6AAAAAA5JPDR6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGI4DCMRRGE . You are receiving this because you authored the thread. Message ID: <koxudaxi/datamodel-code-generator/issues/1577/1747281211@ github.com>
BTW, Do I need to worry about these warnings?
$ datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel
/me/.venv/lib/python3.10/site-packages/datamodel_code_generator/parser/jsonschema.py:334: UserWarning: format of 'accession' not understood for 'string' - using default
warn(f'format of {format__!r} not understood for {type_!r} - using default' '')