Daniel Han
Daniel Han
I found during the Pyarrow conversion, if you pass in a `types_mapper` and setting `ignore_metadata` to `False`, it works! ``` mapping = {schema.type : pd.ArrowDtype(schema.type) for schema in data.schema} data.to_pandas(types_mapper...
Hmm so I looked at the Pandas code, and not sure if using `pd.ArrowDtype(dtype)` will work. The issue is `data.schema.pandas_metadata['columns'][7]["numpy_type"]` is a `str` and not an actual `type` object, and...
@phofl Oh oops I forgot to mention I tried `pd.read_parquet(..., dtype_backend = "pyarrow")`, and the `TypeError` still exists. The error is exactly the same, since it passes the dtype to...
Confirmed it still fails: ``` import pandas as pd import pyarrow as pa pyarrow_list_of_strings = pd.ArrowDtype(pa.list_(pa.string())) data = pd.DataFrame({ "Pyarrow" : pd.Series([["a"], ["a", "b"]], dtype = pyarrow_list_of_strings), }) data.to_parquet("data.parquet") #...
Ye that works since it's an `object` - Pyarrow indeed saves the data inside the parquet file as `list[string]`. The issue is if you explicity parse `list[string]` directly, it does...
In fact the `object` schema is converted: ``` pa.parquet.read_table("data.parquet") ``` returns ``` pyarrow.Table Pyarrow: list child 0, item: string ---- Pyarrow: [[["a"],["a","b"]]] ```
Maybe a `try` `except` so to not break other parts of the Pandas repo? https://github.com/apache/arrow/blob/a77aab07b02b7d0dd6bd9c9a11c4af067d26b674/python/pyarrow/pandas_compat.py#L855 Maybe a `try` `except` so to not break other parts of the Pandas repo? ```...
@takacsd oh interesting - so it's possible its the schema storing component that's wrong?
@takacsd oh yep your reasoning sounds right - so I think adding a simple try except might be a simple maybe? Try calling numpy then if it fails, call pd.ArrowDtype
The main issue I think is because`dtype` is a string I guess. I'm not 100% sure about how `_pandas_api.pandas_dtype` works, but presumably it's a large `dict` mapping types in string...