fastparquet issues

Exclude test directory from distributed wheel

9

When trying to optimize the speed for a serverless/lambda deployment I found that the fastparquet wheel contains a test folder of ~80 Mb. Could this be excluded from the distribution...

gs11

RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims, initial)

13

This warning message is not understandable. I searched the net and found some references to it but have not been able to understand whether it is important or not. Using...

BaruchYoussin

Parquet V2: AttributeError: 'NoneType' object has no attribute 'num_values'

12

**Code:** ``` from fastparquet import ParquetFile pf = ParquetFile('/path/file.parquet') df = pf.to_pandas() ``` **Error:** ``` File "/home/.../venv/lib64/python3.7/site-packages/fastparquet/core.py", line 112, in read_data_page nval = daph.num_values - num_nulls AttributeError: 'NoneType' object has...

bgbraga

Storing ParquetFile as a paquet file in filesystem

4

I'm looking for a way to dump the ParquetFile to an actual file. I've tried [write method](https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write) replacing pandas data with ParquetFile object, but I receive the following error: `TypeError:...

madarez

packaging import issue

1

Looks like the importing packaging, package was added to requirements.txt and merge to master but the tag 0.4.0 still gives issue when installing with pip. As a workaround, I need...

Srivani247

row_groups filters does not use min_value/max_value statistics

6

Hello, In the file `API.py`, the function `filter_out_stats` uses `min` and `max` statistic fields, but they are marked as deprecated in the parquet thrift specification. https://github.com/apache/arrow/blob/master/cpp/src/parquet/parquet.thrift line: 201 As some...

cclienti

building wheel for fastparquet ... error

5

python version: `Python 3.8.2` command: `pip install fastparquet` error: `Building wheels for collected packages: fastparquet Building wheel for fastparquet (setup.py) ... error` ` building 'fastparquet.speedups' extension error: Microsoft Visual C++...

extreme4all

ValueError: Don't know how to convert data type: Int64

5

I am trying to do a parquet file using Dask and Fastparquet from a Dataframe using a column with the type 'Int64' (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). But unfortunately, I got the following error:...

nrodri86

fastparquet 0.3.2, Pandas 1.0.0 deprecation warnings, RangeIndex

3

``` lib\fastparquet\writer.py:655: FutureWarning: RangeIndex._start is deprecated and will be removed in a future version. Use RangeIndex.start instead index_cols = [{'name': index_cols.name, 'start': index_cols._start, lib\fastparquet\writer.py:656: FutureWarning: RangeIndex._step is deprecated and will...

apiszcz

AttributeError: 'SparseDtype' object has no attribute 'itemsize' (Support for Pandas SparseArray columns)

9

Fastparquet does not appear to support writing Dask dataframes with Pandas SparseArray columns. Doing so fails with: ``` AttributeError: 'SparseDtype' object has no attribute 'itemsize' ``` Pandas: 0.25.1 Dask: 2.4.0...

danielchalef

fastparquet
fastparquet copied to clipboard

Metadata

Exclude test directory from distributed wheel

RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims, initial)

Parquet V2: AttributeError: 'NoneType' object has no attribute 'num_values'

Storing ParquetFile as a paquet file in filesystem

packaging import issue

row_groups filters does not use min_value/max_value statistics

building wheel for fastparquet ... error

ValueError: Don't know how to convert data type: Int64

fastparquet 0.3.2, Pandas 1.0.0 deprecation warnings, RangeIndex

AttributeError: 'SparseDtype' object has no attribute 'itemsize' (Support for Pandas SparseArray columns)

← Metadata

Owner

Metadata

fastparquet fastparquet copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastparquet
fastparquet copied to clipboard