pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

[BUG] Fix duplicate `test_remove_columns_strange_cols()`

Open hectormz opened this issue 5 years ago • 5 comments

Brief Description

  • There are two test_remove_columns_strange_cols(). The second one should be testing (and named?) for multilevel dataframes.

  • Currently the second defined test runs, but when renamed, the first test fails. Perhaps @dave-frazzetto or someone else could look into why the first test fails.

hectormz avatar Feb 29 '20 08:02 hectormz

@hectormz if necessary, please feel free to disable the test by marking it as pytest.mark.skip, and then note in the reason field why (and saying "Not sure why it's failing" is totally ok). We can then repurpose this issue as a code sprint issue.

ericmjl avatar Feb 29 '20 19:02 ericmjl

@ericmjl will do. Should I leave this issue open until the sprint, or just remember to revisit any skipped tests at that time?

hectormz avatar Feb 29 '20 20:02 hectormz

@hectormz whoopsies, I totally left this thread hanging!

Yes, let's leave this issue open for the time being. Seeing as there's no sprint, we can just leave it hanging for the moment. (I don't mind having long-standing issues on the repo at this moment in time, though that might change.)

ericmjl avatar May 21 '20 12:05 ericmjl

No worries, I forgot about it too!

hectormz avatar May 22 '20 00:05 hectormz

@ericmjl I looked at it a bit. So this was introduced in #217 . At that point, both tests were added, and the second test overwrote the first, so there was no issue.

The test in question fails, because we're trying to drop a list containing a string and another list:

def test_remove_columns_strange_cols(dataframe):
    df = dataframe.remove_columns(
        column_names=[
            "a",
            ["Bell__Chart", "decorated-elephant", "animals@#$%^", "cities"],
        ]
    )

The expected behavior is to have all of these columns dropped.

But pandas drop() seems to only be able to drop a string, or a list of strings.

I'm not sure what an acceptable solution to this (one is to delete the test 😊). We could flatten any sequences intended to be dropped. But the test underneath is intended to test the ability to drop multilevel dataframe columns:

    df = multilevel_dataframe.remove_columns(
        column_names=[("bar", "one"), ("baz", "two")]
    )

Something like this would satisfy the tests:

    for col in column_names:
        df.drop(columns=col, inplace=True)
    return df

But I don't think we want to do inplace=True, and we also need to check if column_names is a str, and not a sequence, and needs to be packaged in a list first.

Incidentally, I don't think #189 was ever closed as a result of the original PR, although it should have been.

Function: https://github.com/ericmjl/pyjanitor/blob/56c6f48cab5f1c2878b5ee8e4c9f5b0d17f3735c/janitor/functions.py#L1522

Tests: https://github.com/ericmjl/pyjanitor/blob/7333b0ce1e28bb1422c89b30810e54f2aa6d2541/tests/functions/test_remove_columns.py#L36

hectormz avatar May 23 '20 02:05 hectormz