fastparquet
fastparquet copied to clipboard
python implementation of the parquet columnar file format.
Fixes #921. Required due to a change in numpy's type promotion with numpy >= 2: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion [NEP 50 -- Promotion rules for Python scalars](https://numpy.org/neps/nep-0050-scalar-promotion.html) The `DAYS_TO_MILLIS` constant actually contained the...
**Describe the issue**: Upgrading from numpy 1 to 2.0.0rc1, the fastparquet test suite starts to fail during the unit tests for the openSUSE rpm packaging builds. ```python [ 76s] _______________________________...
No longer allows setting series values in-place. Thanks pandas.
**Describe the issue**: Not sure if this is a fastparquet or pyarrow (or pandas) issue, but I noticed that a column with pandas categorical dtype is read as object dtype...
In https://github.com/dask/dask/pull/9979, we added support for using Arrow data types when reading parquet with `pyarrow` engine. I want to start a discussion on whether it makes sense to also support...
**Describe the issue**: Hello, I have a peculiar field type in my parquet file: List of Lists of strings. For example: 0 [] 1 [["hello"]] 2 [["hello", "bye"]] 3 [["hello"],...
Just wanted to ensure you were aware that PyArrow will become a required dependency with pandas 3.0 and I made this issue to address the implications for parquet support: -...
**Describe the issue**: when we write row groups, schema evolution should be easy and should be supported. This is very important for long existing live dataset, we usually want to...
I've seen ``` FAILED dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df12-write_kwargs12-read_kwargs12] - ValueError: Buffer has wrong number of dimensions (expected 1, got 2) FAILED dask/dataframe/io/tests/test_parquet.py::test_roundtrip[fastparquet-df13-write_kwargs13-read_kwargs13] - ValueError: Buffer has wrong number of dimensions (expected 1, got...
For some reason, I need to write parquet file content to buffer(io.BytesIO), but seams like this package will close file-object after writing always, for example: ```python3 data = [{"x": 1,...