pyjanitor
pyjanitor copied to clipboard
[ENH] Should clean_names return a copy of the dataframe?
Just stumbled upon this stackoverflow question:
https://stackoverflow.com/questions/53427376/preserve-original-column-names
It appears the issue is that the clean_names function returns a copy of a dataframe and doesn't preserve other attributes.
Might be something to look into for this and other functions as well?
Thanks for bringing this up, @szuckerman!
There's definitely inconsistent behaviour w.r.t. copying a dataframe or not inside each function. This is something I think should be fixed at the SciPy sprints this year. I probably will also lead one at PyCon.
This issue basically boils down to a decision: copy by default, or not? We talked a bit about int #79, but with new information that in-place mutation is going away, I think pandas is going to become an API for small/medium data users.
I'm 100% okay deferring this decision beyond SciPy 2019. We don't have to act on this right now, as it's not producing frequent confusion for package users.
At the same time, I super like @zbarry's proposed implementation in #79, which can be used to ensure all functions copy-by-default. I'm sure someone at PyCon/SciPy would be happy to run with it.
clean_names does not mutate the dataframe, since the columns/index are immutable. A slice of the dataframe (shallow copy) is all that's needed, without affecting the original dataframe.