fastbook icon indicating copy to clipboard operation
fastbook copied to clipboard

Problem with cont_cat_split() in Google Collab in section 09_tabular.ipynb

Open maximgreenberg opened this issue 4 years ago • 6 comments
trafficstars

Hi, I am using Google Collab. I am trying to run section 09_tabular.ipynb fastai/fastbook/blob/master/clean/09_tabular.ipynb and starting from today I have a problem with cont,cat = cont_cat_split(df, 1, dep_var=dep_var) it throws the following error: TypeError Traceback (most recent call last) in () ----> 1 cont,cat = cont_cat_split(df, 1, dep_var=dep_var)

1 frames /usr/local/lib/python3.6/dist-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2) 386 “”" 387 if not issubclass_(arg1, generic): –> 388 arg1 = dtype(arg1).type 389 if not issubclass_(arg2, generic): 390 arg2 = dtype(arg2).type

TypeError: Cannot interpret ‘UInt32Dtype()’ as a data type

I already run this notebook before and it run fine. If this problem is fixed manually by changing the datatype to int64, it throws a simmilar error with ProductSize which is a “category”. Please let me know if you need any additional information. Tnx, Maxim

maximgreenberg avatar Jan 09 '21 09:01 maximgreenberg

I found the same issue, and it looks like this was introduced recently: https://github.com/fastai/fastai/pull/3117

It seems that np.issubdtype doesn't play well with pandas types such as UInt32Dtype(). Using pd.api.types.is_integer_dtype and is_float_dtype seems to fix it for me.

I'll create a PR but this should get you/others going in the meantime:

def cont_cat_split(df, max_card=20, dep_var=None):
    "Helper function that returns column names of cont and cat variables from given `df`."
    cont_names, cat_names = [], []
    for label in df:
        if label in L(dep_var): continue
        if (pd.api.types.is_integer_dtype(df[label].dtype) and
            df[label].unique().shape[0] > max_card or
            pd.api.types.is_float_dtype(df[label].dtype)):
            cont_names.append(label)
        else: cat_names.append(label)
    return cont_names, cat_names

chrismilleruk avatar Jan 12 '21 20:01 chrismilleruk

The fix got released with fastai 2.2.5. In case of using pip just be aware of https://github.com/fastai/fastai/issues/3220

aberres avatar Feb 19 '21 10:02 aberres

The function definition is duplicated in 09_tabular.ipynb. Not sure if there has been a PR to update the notebook itself yet?

chrismilleruk avatar Feb 19 '21 10:02 chrismilleruk

Apart from other problems (related to https://github.com/fastai/fastai/pull/3230) the notebook looks good. No embedded copy.

aberres avatar Feb 19 '21 13:02 aberres

Ah, quite right, I think I confused it with my local copy.

The fix was actually released in 2.2.4 but suffered from the same issue. Hopefully 2.2.6 will update PyPI 👍

chrismilleruk avatar Feb 19 '21 18:02 chrismilleruk

As of today the notebook should be fine with fast.ai v2.2.7.

aberres avatar Feb 23 '21 06:02 aberres