pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

[ENH] Should clean_names return a copy of the dataframe?

Open szuckerman opened this issue 6 years ago • 1 comments

Just stumbled upon this stackoverflow question:

https://stackoverflow.com/questions/53427376/preserve-original-column-names

It appears the issue is that the clean_names function returns a copy of a dataframe and doesn't preserve other attributes.

Might be something to look into for this and other functions as well?

szuckerman avatar Apr 09 '19 17:04 szuckerman

Thanks for bringing this up, @szuckerman!

There's definitely inconsistent behaviour w.r.t. copying a dataframe or not inside each function. This is something I think should be fixed at the SciPy sprints this year. I probably will also lead one at PyCon.

This issue basically boils down to a decision: copy by default, or not? We talked a bit about int #79, but with new information that in-place mutation is going away, I think pandas is going to become an API for small/medium data users.

I'm 100% okay deferring this decision beyond SciPy 2019. We don't have to act on this right now, as it's not producing frequent confusion for package users.

At the same time, I super like @zbarry's proposed implementation in #79, which can be used to ensure all functions copy-by-default. I'm sure someone at PyCon/SciPy would be happy to run with it.

ericmjl avatar Apr 09 '19 19:04 ericmjl

clean_names does not mutate the dataframe, since the columns/index are immutable. A slice of the dataframe (shallow copy) is all that's needed, without affecting the original dataframe.

samukweku avatar Jan 04 '24 10:01 samukweku