pandera
pandera copied to clipboard
`DataFrameSchema.rename_columns` doesnt allow no-op mapping
Describe the bug
DataFrameSchema.rename_columns doesnt allow what is effectively no-op mapping, e.g. passing in a map like {'col1': 'col1'}.
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the master branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandera as pa
schema = pa.DataFrameSchema({"col1": pa.Column()})
schema.rename_columns({"col1": "col1"})
Note, the use case for this is when rename_dict also contains non no-op mappings, e.g. {'col1': 'col1', 'col2': 'col2_new'}
Result:
Traceback (most recent call last):
File "/Users/henrysorsky/Library/Application Support/JetBrains/PyCharm2022.1/scratches/scratch_18.py", line 4, in <module>
schema.rename_columns({"col1": "col1"})
File "/Users/henrysorsky/pandera/pandera/schemas.py", line 1246, in rename_columns
raise errors.SchemaInitError(
pandera.errors.SchemaInitError: Keys ['col1'] already found in schema columns!
Expected behavior
In the example case above, the result would be
<Schema DataFrameSchema(columns={'col1': <Schema Column(name=col1, type=None)>}, checks=[], index=None, coerce=False, dtype=None, strict=False, name=None, ordered=False, unique_column_names=False)>
in general, I'd expect the keys that map to themselves in rename_dict to stay the same and those that don't to be renamed.
Desktop (please complete the following information):
- OS: [e.g. iOS] macOS
- Browser [e.g. chrome, safari] N/A
- Version [e.g. 22] 12.5
Screenshots
N/A
Additional context
N/A
hey @hsorsky just to understand this issue better, if you have
import pandera as pa
schema = pa.DataFrameSchema({"col1": pa.Column()})
schema.rename_columns({"col1": "col1", "col2": "new_col2"})
Why not do schema.rename_columns({"col2": "new_col2"}) instead?
for the particular usecase I had that lead me to discover the described behaviour, that was the exact workaround I implemented. however, it seemed odd that column renaming would fail in the case of mapping to itself when in theory it is a valid operation.
https://github.com/unionai-oss/pandera/pull/941