json2parquet icon indicating copy to clipboard operation
json2parquet copied to clipboard

Convert JSON files to Parquet using PyArrow

Results 16 json2parquet issues
Sort by recently updated
recently updated
newest added

Once again pyarrow has advanced several versions beyond the maximum version supported by this package. Would it be possible to support newer versions of pyarrow? Also, since pandas 2.x has...

I am relatively new to Python so apologies if I'm interpreting the code incorrectly. What I'm trying to do is read a schema from an existing Parquet file and then...

I had pyarrow which i installed via : conda install -c conda-forge pyarrow and then i did : pip install json2parquet I got error : " module 'pyarrow' has no...

in JSON, my timestamp field looms like `block_timestamp":1333413850` after conversion to pq using pa.timestamp('s', tz='UTC') and use_deprecated_int96_timestamps=True my timestamp starts looking like `block_timestamp = zg+IUgAAAACMPSUA`(FYI, I used parquet-tools to check...

The current implementation uses `ns` as a fixed unit for timestamp fields: https://github.com/andrewgross/json2parquet/blob/master/json2parquet/client.py#L98 When using `pa.Schema`, the unit is ignored: ```python schema = pa.schema([ pa.field('a', pa.int64()), pa.field('b', pa.string()), pa.field('c', pa.float64()),...

### Requirement Support to read data from nginx logs which has timestamp in format `1564388230.097`. ### Approach We can include `unit` param to read epoch time [here](https://github.com/andrewgross/json2parquet/blob/master/json2parquet/client.py#L95) as defined [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)...

What is the best way to extract year, month, day from timestamp and use for partition of parquet when writing to disk or s3fs?

Couple of changes : 1) Changed read json file using for in file (faster and cleaner according to documentation) 2) Merged load schema function and convert data with column name...

People do weird stuff in JSON. A lot of systems kinda figure stuff out. Redshift will convert strings to INTs for you etc. PyArrow purposely avoids doing unexpected stuff, and...