jsonschema-gentypes
jsonschema-gentypes copied to clipboard
Reproducible generation
I have a large-ish (9k lines) schema.json file that is used to generate python types. What I find however that re-running jsonschema-gentypes several times on the same schema file will generate different output .py.
To be clear, the generated output is all "functionally equivalent" but types and default values are given different names (random numbers added) such as for example:
` -_REQUEST_AREA8181_DEFAULT = 'Nord' +_REQUEST_AREA4825_DEFAULT = 'Nord'
-_PARSED_SOURCE_REPR_MODELS5526_DEFAULT = {'prop': {}, 'spot': {}, 'tr': {}} +_PARSED_SOURCE_REPR_MODELS3775_DEFAULT = {'prop': {}, 'spot': {}, 'tr': {}}
- maturity: Required[tuple["FilterMaturity09394", "FilterMaturity17630"]]
- maturity: Required[tuple["FilterMaturity07100", "FilterMaturity12908"]] `
Is to create reproducible files a goal of jsonschema-gentypes such that it would be of interest to investiagte why this is happening and fix it? Or is it not?
Effectively, in some case I generate random number to avoid duplicated names: https://github.com/sbrunner/jsonschema-gentypes/blob/b2ecef8520a628c6601bbaa159359d13029d5898/jsonschema_gentypes/init.py#L794
The current solution to avoid this is to set a non-unique title to the concerned element in the JSON schema, but I understand that's not always possible.
Then any better solution to avoid this is welcome :-)
Noted, I see in that function it's not avaialble - but would it e.g. be easy to use the source code line or character number in the schema file? Then identical json files would at least give the same result
Alternatively, maybe even better, what about a small function that instead tries 0, 1, 2, 3... sequentially to add until it finds one that is "free"
That way the same code will be generated provided that the soruce is poarsed in the same sequence
I.e. effectively something like replacing
if not get_name.__dict__.get("names"):
get_name.__dict__["names"] = set()
elif output in get_name.__dict__["names"]:
output += str(random.randint(0, 9999)) # noqa: S311 # nosec
get_name.__dict__["names"].add(output)
With:
if not get_name.__dict__.get("names"):
get_name.__dict__["names"] = set()
names = get_name.__dict__["names"]
def get_name(name: str):
for i in range(100):
if not f"output{i}" in names:
return f"output{i}"
return name + str(random.randint(0, 9999)) # noqa: S311 # nosec
output = get_name(output)
names.add(output)
Looks good, can you do a pull request with that?
Ill try to do that this weekend, not allowed to pull down code on the pc i am at at the moment