datamodel-code-generator icon indicating copy to clipboard operation
datamodel-code-generator copied to clipboard

`_get_unique_name` take really long with large swagger definition file

Open FJEANNOT opened this issue 9 months ago • 1 comments

Profiling shows a really a really long execution time of the _get_unique_name method from class ModelResolver

Image

To Reproduce

Example schema:
We use a large swagger file created on the fly. Typically a merged JSON Schema of all the schema in this folder: https://github.com/crossplane-contrib/provider-upjet-azure/blob/main/package/crds. With some modifications. For instance, we build the property name with the k8s apiGroup in this folder, then append the kind one more time (ex: io.upbound.azure.analysisservices.Server becomes io.upbound.azure.analysisservices.Server.Server) to makes sure datamodel codename outputs the resource in a dedicated file

Used commandline:

    generate(
        json.dumps(swagger),  # our really long swagger definition
        collapse_root_models=True,
        disable_timestamp=True,
        field_constraints=True,
        input_file_type=InputFileType.JsonSchema,
        output=output_dir,
        output_model_type=DataModelType.PydanticV2BaseModel,
        target_python_version=PythonVersion.PY_312,
        use_annotated=True,
        use_field_description=True,
        use_schema_description=True,
        use_subclass_enum=True,
        custom_formatters=[]
    )

Looking at the function, it seems it iterates on every 'object' property of the entire definition to avoid duplicates names. We tried to add a title to every object and pass the option use_title_as_name, but _get_unique_name is still called by get_class_name.

To my understanding, the functions iterates on every object reference to come up with a unique name to avoid duplicate class name in the ouput. But since we are rendering the classes in a dedicated file containing only a few classes of the complete output, isn't it possible to limit the reference list to only the ones present in the ouput file that will contain the class ?

Also, is it possible to completely drop the usage of _get_unique_name when calling with use_title_as_name ?

Additional context As mentionned above, the swagger definition comes from the Kubernetes ecosystem.

The complete swagger file we are using is quite big. We combine the swagger.json file provided by our Kubernetes server with a large list of CustomResourceDefinition OpenAPISchemas. I can provide the complete script if needed.

That being said, the script still takes really long just with a smaller (but still large) swagger file, like the combination of all the CustomResourceDefinitions linked above.

FJEANNOT avatar Jan 28 '25 14:01 FJEANNOT

PR welcome.

gaborbernat avatar Feb 06 '25 20:02 gaborbernat