healthcareai-py
healthcareai-py copied to clipboard
Create function to remove columns that are only NaN
Please make sure this works for cols that are 1) NA only and 2) NaN only
- [ ] implement
- [ ] unit tests
Can use this to test:
df = pd.DataFrame({'a':[1, None, 2, 3], 'b':['m', 'f', None, 'f'], 'c':[3, 4, 5, None], 'd':[None, 8, 1, 3], 'label':['Y', 'N', 'Y', 'N']})
Pandas already has a built in functionality: DataFrame.dropna(axis=1, how='all')
It drops any columns that are all NaN.
Looks like I can create a class: class DataFramedDropNa(TransformerMixin) in the healthcareai-py/healthcareai/common/transformers.py folder.
Then create a unit test in: test/test_df_dropna_class.py
If this sounds like a plan, then I'll proceed as is.
By the way, Numpy does not support Na, hence Pandas doesn't either.
I wonder if this should happen during the cardinality checks and at least warn the user.
I think that’s a good idea. I’ll do the changes.