jsf
jsf copied to clipboard
Question: Supporting Parquet datatypes
I have a use case where I need to generate data for parquet datatypes. I am currently using a custom version of JSF. Would you like to have this feature here?
JSON looks like the following:
"UInt32": {
"type": "uint32"
},
"UInt64": {
"type": "uint64"
},
"Float16": {
"type": "float16"
}
[number.py:jsf.src.schema_types.number:line 304 - generate()] - INFO: Generating random uint32 [number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 35227457 [number.py:jsf.src.schema_types.number:line 333 - generate()] - INFO: Generating random uint64 [number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 4669327448559716910 [number.py:jsf.src.schema_types.number:line 362 - generate()] - INFO: Generating random float16 [number.py:jsf.src.schema_types.number:line 57 - generate()] - DEBUG: is_float: True [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 1.920763087895552e+17
I don't believe JSON schema supports those numeric types, unless you can point me to the definition in the schema?
As you correctly mentioned, json schema does not support these datatypes but this is just for ease of data generation. For a given datatype, there are implicit ranges which can be manipulated using "minimum" and "maximum" but should always fall between the range for that datatype.
I wouldn't add those types into jsf directly but if you proposed a PR for allowing people to extend the JSON Schema types with custom types it would work.
Then you would just manage your custom type generator classes outside of jsf
@ghandic Any ideas on how this can be done?
Yes, unfortunately I don't have time to implement but PR's are welcome.
Would be along the lines of making subclasses of the given base class and defining a mapping for a type to the class that should be ran