arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] 'pyarrow._parquet.SortingColumn' object has no attribute 'to_dict'

Open djkapner opened this issue 1 year ago • 2 comments

Describe the bug, including details regarding any error messages, version, and platform.

When a SortingColumn is present, the metadata of a ParquetFile can not be serialized with to_dict() because SortingColumn is missing this method.

import polars as pl
import pyarrow.parquet as pq

df = pl.DataFrame({"a": [1, 2], "b": [10, 11]})
fname = "tmp.parquet"
pq.write_table(
    df.to_arrow(),
    fname,
    sorting_columns=[pq.SortingColumn(0),],
)

pqf = pq.ParquetFile(fname)
print(pqf.metadata.row_group(0).sorting_columns[0])
print(pqf.metadata.to_dict())

results in :

SortingColumn(column_index=0, descending=False, nulls_first=False)
...

  File "pyarrow/_parquet.pyx", line 892, in pyarrow._parquet.FileMetaData.to_dict
  File "pyarrow/_parquet.pyx", line 790, in pyarrow._parquet.RowGroupMetaData.to_dict
AttributeError: 'pyarrow._parquet.SortingColumn' object has no attribute 'to_dict'

Component(s)

Parquet, Python

djkapner avatar May 17 '24 02:05 djkapner

@tlm365 Can you send a reply here? I don't know why doesn't this pr be not assigned :-( Maybe you can first "take" or reply here and I'd like assign this to you

mapleFU avatar May 17 '24 14:05 mapleFU

take

tlm365 avatar May 17 '24 14:05 tlm365

Issue resolved by pull request 41704 https://github.com/apache/arrow/pull/41704

AlenkaF avatar May 21 '24 08:05 AlenkaF