fastparquet
fastparquet copied to clipboard
python implementation of the parquet columnar file format.
When trying to optimize the speed for a serverless/lambda deployment I found that the fastparquet wheel contains a test folder of ~80 Mb. Could this be excluded from the distribution...
This warning message is not understandable. I searched the net and found some references to it but have not been able to understand whether it is important or not. Using...
**Code:** ``` from fastparquet import ParquetFile pf = ParquetFile('/path/file.parquet') df = pf.to_pandas() ``` **Error:** ``` File "/home/.../venv/lib64/python3.7/site-packages/fastparquet/core.py", line 112, in read_data_page nval = daph.num_values - num_nulls AttributeError: 'NoneType' object has...
I'm looking for a way to dump the ParquetFile to an actual file. I've tried [write method](https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write) replacing pandas data with ParquetFile object, but I receive the following error: `TypeError:...
Looks like the importing packaging, package was added to requirements.txt and merge to master but the tag 0.4.0 still gives issue when installing with pip. As a workaround, I need...
Hello, In the file `API.py`, the function `filter_out_stats` uses `min` and `max` statistic fields, but they are marked as deprecated in the parquet thrift specification. https://github.com/apache/arrow/blob/master/cpp/src/parquet/parquet.thrift line: 201 As some...
python version: `Python 3.8.2` command: `pip install fastparquet` error: `Building wheels for collected packages: fastparquet Building wheel for fastparquet (setup.py) ... error` ` building 'fastparquet.speedups' extension error: Microsoft Visual C++...
I am trying to do a parquet file using Dask and Fastparquet from a Dataframe using a column with the type 'Int64' (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). But unfortunately, I got the following error:...
``` lib\fastparquet\writer.py:655: FutureWarning: RangeIndex._start is deprecated and will be removed in a future version. Use RangeIndex.start instead index_cols = [{'name': index_cols.name, 'start': index_cols._start, lib\fastparquet\writer.py:656: FutureWarning: RangeIndex._step is deprecated and will...
Fastparquet does not appear to support writing Dask dataframes with Pandas SparseArray columns. Doing so fails with: ``` AttributeError: 'SparseDtype' object has no attribute 'itemsize' ``` Pandas: 0.25.1 Dask: 2.4.0...