jsonschema-gentypes icon indicating copy to clipboard operation
jsonschema-gentypes copied to clipboard

Reproducible generation

Open vron opened this issue 6 months ago • 6 comments

I have a large-ish (9k lines) schema.json file that is used to generate python types. What I find however that re-running jsonschema-gentypes several times on the same schema file will generate different output .py.

To be clear, the generated output is all "functionally equivalent" but types and default values are given different names (random numbers added) such as for example:

` -_REQUEST_AREA8181_DEFAULT = 'Nord' +_REQUEST_AREA4825_DEFAULT = 'Nord'

-_PARSED_SOURCE_REPR_MODELS5526_DEFAULT = {'prop': {}, 'spot': {}, 'tr': {}} +_PARSED_SOURCE_REPR_MODELS3775_DEFAULT = {'prop': {}, 'spot': {}, 'tr': {}}

  • maturity: Required[tuple["FilterMaturity09394", "FilterMaturity17630"]]
  • maturity: Required[tuple["FilterMaturity07100", "FilterMaturity12908"]] `

Is to create reproducible files a goal of jsonschema-gentypes such that it would be of interest to investiagte why this is happening and fix it? Or is it not?

vron avatar May 15 '25 09:05 vron

Effectively, in some case I generate random number to avoid duplicated names: https://github.com/sbrunner/jsonschema-gentypes/blob/b2ecef8520a628c6601bbaa159359d13029d5898/jsonschema_gentypes/init.py#L794

The current solution to avoid this is to set a non-unique title to the concerned element in the JSON schema, but I understand that's not always possible.

Then any better solution to avoid this is welcome :-)

sbrunner avatar May 15 '25 09:05 sbrunner

Noted, I see in that function it's not avaialble - but would it e.g. be easy to use the source code line or character number in the schema file? Then identical json files would at least give the same result

vron avatar May 15 '25 10:05 vron

Alternatively, maybe even better, what about a small function that instead tries 0, 1, 2, 3... sequentially to add until it finds one that is "free"

That way the same code will be generated provided that the soruce is poarsed in the same sequence

vron avatar May 15 '25 10:05 vron

I.e. effectively something like replacing

    if not get_name.__dict__.get("names"):
        get_name.__dict__["names"] = set()
    elif output in get_name.__dict__["names"]:
        output += str(random.randint(0, 9999))  # noqa: S311 # nosec
    get_name.__dict__["names"].add(output)

With:

    if not get_name.__dict__.get("names"):
        get_name.__dict__["names"] = set()
    names = get_name.__dict__["names"]

    def get_name(name: str):
        for i in range(100):
            if not f"output{i}" in names:
                return f"output{i}"
        return name + str(random.randint(0, 9999))  # noqa: S311 # nosec

    output = get_name(output)
    names.add(output)

vron avatar May 15 '25 10:05 vron

Looks good, can you do a pull request with that?

sbrunner avatar May 15 '25 10:05 sbrunner

Ill try to do that this weekend, not allowed to pull down code on the pc i am at at the moment

vron avatar May 15 '25 10:05 vron