GH-36593: [Python] Add rename_columns method to pyarrow datasets
Rationale for this change
See https://github.com/apache/arrow/issues/36593 In particular this change is convenient when the column names stored in a file are different from the logical names associated with the columns (see deltalake column mapping feature as an example).
What changes are included in this PR?
Adds the rename_columns method to datasets in pyarrow.
This mehod allows a user to rename columns in the data returned from a scan before actually creating a scanner object.
Are these changes tested?
This PR also add a test for the new rename_columns method using an InMemoryDataset.
Are there any user-facing changes?
Adds the rename_columns method to pyarrow datasets.
- GitHub Issue: #36593
:warning: GitHub issue #36593 has been automatically assigned in GitHub to PR creator.
@rok @raulcd @AlenkaF It's ready to merge now, could you take a look?
I am not sure about the changes in this PR, mainly because I am not very knowledgable when it comes to Acero and datasets. The functionality seems great to have, but modifying _scan_options for change of column names on read feels a bit hacky.
What do you think @rok ?
The change looks good to me in principle.
I do agree with @AlenkaF that changing _scan_options seems a bit forced and could have unexpected consequences elsewhere. Can you check if there is a nicer way?
Sounds good, I'm now using a new attribute called _columns instead of relying on _scan_options
@rok @AlenkaF