pandera icon indicating copy to clipboard operation
pandera copied to clipboard

`DataFrameSchema.rename_columns` doesnt allow no-op mapping

Open hsorsky opened this issue 3 years ago • 1 comments

Describe the bug DataFrameSchema.rename_columns doesnt allow what is effectively no-op mapping, e.g. passing in a map like {'col1': 'col1'}.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [x] (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandera as pa

schema = pa.DataFrameSchema({"col1": pa.Column()})
schema.rename_columns({"col1": "col1"})

Note, the use case for this is when rename_dict also contains non no-op mappings, e.g. {'col1': 'col1', 'col2': 'col2_new'}

Result:
Traceback (most recent call last):
  File "/Users/henrysorsky/Library/Application Support/JetBrains/PyCharm2022.1/scratches/scratch_18.py", line 4, in <module>
    schema.rename_columns({"col1": "col1"})
  File "/Users/henrysorsky/pandera/pandera/schemas.py", line 1246, in rename_columns
    raise errors.SchemaInitError(
pandera.errors.SchemaInitError: Keys ['col1'] already found in schema columns!

Expected behavior

In the example case above, the result would be

<Schema DataFrameSchema(columns={'col1': <Schema Column(name=col1, type=None)>}, checks=[], index=None, coerce=False, dtype=None, strict=False, name=None, ordered=False, unique_column_names=False)>

in general, I'd expect the keys that map to themselves in rename_dict to stay the same and those that don't to be renamed.

Desktop (please complete the following information):

  • OS: [e.g. iOS] macOS
  • Browser [e.g. chrome, safari] N/A
  • Version [e.g. 22] 12.5

Screenshots

N/A

Additional context

N/A

hsorsky avatar Sep 09 '22 13:09 hsorsky

hey @hsorsky just to understand this issue better, if you have

import pandera as pa

schema = pa.DataFrameSchema({"col1": pa.Column()})
schema.rename_columns({"col1": "col1", "col2": "new_col2"})

Why not do schema.rename_columns({"col2": "new_col2"}) instead?

cosmicBboy avatar Sep 15 '22 14:09 cosmicBboy

for the particular usecase I had that lead me to discover the described behaviour, that was the exact workaround I implemented. however, it seemed odd that column renaming would fail in the case of mapping to itself when in theory it is a valid operation.

hsorsky avatar Sep 25 '22 09:09 hsorsky

https://github.com/unionai-oss/pandera/pull/941

hsorsky avatar Sep 27 '22 18:09 hsorsky