pyjanitor
pyjanitor copied to clipboard
[BUG] Fix duplicate `test_remove_columns_strange_cols()`
Brief Description
-
There are two
test_remove_columns_strange_cols(). The second one should be testing (and named?) for multilevel dataframes. -
Currently the second defined test runs, but when renamed, the first test fails. Perhaps @dave-frazzetto or someone else could look into why the first test fails.
@hectormz if necessary, please feel free to disable the test by marking it as pytest.mark.skip, and then note in the reason field why (and saying "Not sure why it's failing" is totally ok). We can then repurpose this issue as a code sprint issue.
@ericmjl will do. Should I leave this issue open until the sprint, or just remember to revisit any skipped tests at that time?
@hectormz whoopsies, I totally left this thread hanging!
Yes, let's leave this issue open for the time being. Seeing as there's no sprint, we can just leave it hanging for the moment. (I don't mind having long-standing issues on the repo at this moment in time, though that might change.)
No worries, I forgot about it too!
@ericmjl I looked at it a bit. So this was introduced in #217 . At that point, both tests were added, and the second test overwrote the first, so there was no issue.
The test in question fails, because we're trying to drop a list containing a string and another list:
def test_remove_columns_strange_cols(dataframe):
df = dataframe.remove_columns(
column_names=[
"a",
["Bell__Chart", "decorated-elephant", "animals@#$%^", "cities"],
]
)
The expected behavior is to have all of these columns dropped.
But pandas drop() seems to only be able to drop a string, or a list of strings.
I'm not sure what an acceptable solution to this (one is to delete the test 😊). We could flatten any sequences intended to be dropped. But the test underneath is intended to test the ability to drop multilevel dataframe columns:
df = multilevel_dataframe.remove_columns(
column_names=[("bar", "one"), ("baz", "two")]
)
Something like this would satisfy the tests:
for col in column_names:
df.drop(columns=col, inplace=True)
return df
But I don't think we want to do inplace=True, and we also need to check if column_names is a str, and not a sequence, and needs to be packaged in a list first.
Incidentally, I don't think #189 was ever closed as a result of the original PR, although it should have been.
Function: https://github.com/ericmjl/pyjanitor/blob/56c6f48cab5f1c2878b5ee8e4c9f5b0d17f3735c/janitor/functions.py#L1522
Tests: https://github.com/ericmjl/pyjanitor/blob/7333b0ce1e28bb1422c89b30810e54f2aa6d2541/tests/functions/test_remove_columns.py#L36