Daniel Han comments

Results 781 comments of


                                            Daniel Han

BUG: read_parquet converts pyarrow list type to numpy dtype

I found during the Pyarrow conversion, if you pass in a `types_mapper` and setting `ignore_metadata` to `False`, it works! ``` mapping = {schema.type : pd.ArrowDtype(schema.type) for schema in data.schema} data.to_pandas(types_mapper...

BUG: read_parquet converts pyarrow list type to numpy dtype

Hmm so I looked at the Pandas code, and not sure if using `pd.ArrowDtype(dtype)` will work. The issue is `data.schema.pandas_metadata['columns'][7]["numpy_type"]` is a `str` and not an actual `type` object, and...

BUG: read_parquet converts pyarrow list type to numpy dtype

@phofl Oh oops I forgot to mention I tried `pd.read_parquet(..., dtype_backend = "pyarrow")`, and the `TypeError` still exists. The error is exactly the same, since it passes the dtype to...

BUG: read_parquet converts pyarrow list type to numpy dtype

Confirmed it still fails: ``` import pandas as pd import pyarrow as pa pyarrow_list_of_strings = pd.ArrowDtype(pa.list_(pa.string())) data = pd.DataFrame({ "Pyarrow" : pd.Series([["a"], ["a", "b"]], dtype = pyarrow_list_of_strings), }) data.to_parquet("data.parquet") #...

BUG: read_parquet converts pyarrow list type to numpy dtype

Ye that works since it's an `object` - Pyarrow indeed saves the data inside the parquet file as `list[string]`. The issue is if you explicity parse `list[string]` directly, it does...

BUG: read_parquet converts pyarrow list type to numpy dtype

In fact the `object` schema is converted: ``` pa.parquet.read_table("data.parquet") ``` returns ``` pyarrow.Table Pyarrow: list child 0, item: string ---- Pyarrow: [[["a"],["a","b"]]] ```

BUG: read_parquet converts pyarrow list type to numpy dtype

Maybe a `try` `except` so to not break other parts of the Pandas repo? https://github.com/apache/arrow/blob/a77aab07b02b7d0dd6bd9c9a11c4af067d26b674/python/pyarrow/pandas_compat.py#L855 Maybe a `try` `except` so to not break other parts of the Pandas repo? ```...