jsf icon indicating copy to clipboard operation
jsf copied to clipboard

Question: Supporting Parquet datatypes

Open ayushbindlish opened this issue 11 months ago • 5 comments

I have a use case where I need to generate data for parquet datatypes. I am currently using a custom version of JSF. Would you like to have this feature here?

JSON looks like the following:

"UInt32": {
      "type": "uint32"
    },
    "UInt64": {
      "type": "uint64"
    },
    "Float16": {
      "type": "float16"
    }

[number.py:jsf.src.schema_types.number:line 304 - generate()] - INFO: Generating random uint32 [number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 35227457 [number.py:jsf.src.schema_types.number:line 333 - generate()] - INFO: Generating random uint64 [number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 4669327448559716910 [number.py:jsf.src.schema_types.number:line 362 - generate()] - INFO: Generating random float16 [number.py:jsf.src.schema_types.number:line 57 - generate()] - DEBUG: is_float: True [number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 1.920763087895552e+17

ayushbindlish avatar Mar 19 '24 20:03 ayushbindlish

I don't believe JSON schema supports those numeric types, unless you can point me to the definition in the schema?

ghandic avatar Mar 19 '24 21:03 ghandic

As you correctly mentioned, json schema does not support these datatypes but this is just for ease of data generation. For a given datatype, there are implicit ranges which can be manipulated using "minimum" and "maximum" but should always fall between the range for that datatype.

ayushbindlish avatar Mar 19 '24 21:03 ayushbindlish

I wouldn't add those types into jsf directly but if you proposed a PR for allowing people to extend the JSON Schema types with custom types it would work.

Then you would just manage your custom type generator classes outside of jsf

ghandic avatar Mar 19 '24 21:03 ghandic

@ghandic Any ideas on how this can be done?

ayushbindlish avatar Apr 03 '24 09:04 ayushbindlish

Yes, unfortunately I don't have time to implement but PR's are welcome.

Would be along the lines of making subclasses of the given base class and defining a mapping for a type to the class that should be ran

ghandic avatar Apr 03 '24 09:04 ghandic