healthcareai-py icon indicating copy to clipboard operation
healthcareai-py copied to clipboard

Create function to remove columns that are only NaN

Open levithatcher opened this issue 8 years ago • 4 comments

Please make sure this works for cols that are 1) NA only and 2) NaN only

  • [ ] implement
  • [ ] unit tests

Can use this to test:

df = pd.DataFrame({'a':[1, None, 2, 3], 'b':['m', 'f', None, 'f'], 'c':[3, 4, 5, None], 'd':[None, 8, 1, 3], 'label':['Y', 'N', 'Y', 'N']})

levithatcher avatar Aug 15 '16 14:08 levithatcher

Pandas already has a built in functionality: DataFrame.dropna(axis=1, how='all')

It drops any columns that are all NaN.

Looks like I can create a class: class DataFramedDropNa(TransformerMixin) in the healthcareai-py/healthcareai/common/transformers.py folder.

Then create a unit test in: test/test_df_dropna_class.py

If this sounds like a plan, then I'll proceed as is.

mxlei01 avatar Jul 26 '17 10:07 mxlei01

By the way, Numpy does not support Na, hence Pandas doesn't either.

mxlei01 avatar Jul 26 '17 21:07 mxlei01

I wonder if this should happen during the cardinality checks and at least warn the user.

Aylr avatar Oct 06 '17 22:10 Aylr

I think that’s a good idea. I’ll do the changes.

mxlei01 avatar Oct 07 '17 01:10 mxlei01