fastbook
fastbook copied to clipboard
Problem with cont_cat_split() in Google Collab in section 09_tabular.ipynb
Hi, I am using Google Collab. I am trying to run section 09_tabular.ipynb fastai/fastbook/blob/master/clean/09_tabular.ipynb and starting from today I have a problem with cont,cat = cont_cat_split(df, 1, dep_var=dep_var) it throws the following error: TypeError Traceback (most recent call last) in () ----> 1 cont,cat = cont_cat_split(df, 1, dep_var=dep_var)
1 frames /usr/local/lib/python3.6/dist-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2) 386 “”" 387 if not issubclass_(arg1, generic): –> 388 arg1 = dtype(arg1).type 389 if not issubclass_(arg2, generic): 390 arg2 = dtype(arg2).type
TypeError: Cannot interpret ‘UInt32Dtype()’ as a data type
I already run this notebook before and it run fine. If this problem is fixed manually by changing the datatype to int64, it throws a simmilar error with ProductSize which is a “category”. Please let me know if you need any additional information. Tnx, Maxim
I found the same issue, and it looks like this was introduced recently: https://github.com/fastai/fastai/pull/3117
It seems that np.issubdtype doesn't play well with pandas types such as UInt32Dtype(). Using pd.api.types.is_integer_dtype and is_float_dtype seems to fix it for me.
I'll create a PR but this should get you/others going in the meantime:
def cont_cat_split(df, max_card=20, dep_var=None):
"Helper function that returns column names of cont and cat variables from given `df`."
cont_names, cat_names = [], []
for label in df:
if label in L(dep_var): continue
if (pd.api.types.is_integer_dtype(df[label].dtype) and
df[label].unique().shape[0] > max_card or
pd.api.types.is_float_dtype(df[label].dtype)):
cont_names.append(label)
else: cat_names.append(label)
return cont_names, cat_names
The fix got released with fastai 2.2.5. In case of using pip just be aware of https://github.com/fastai/fastai/issues/3220
The function definition is duplicated in 09_tabular.ipynb. Not sure if there has been a PR to update the notebook itself yet?
Apart from other problems (related to https://github.com/fastai/fastai/pull/3230) the notebook looks good. No embedded copy.
Ah, quite right, I think I confused it with my local copy.
The fix was actually released in 2.2.4 but suffered from the same issue. Hopefully 2.2.6 will update PyPI 👍
As of today the notebook should be fine with fast.ai v2.2.7.