dataprep
dataprep copied to clipboard
Jupyter kernel abort when plotting a column with pandas type "category"
I ran into the issue documented in Issue #230 and was thinking that if I marked the column as dtype=category in pandas that the dataprep plot would use that as a hint that the columns was nominal and not continuous. However when I do this, it is reliably causing a Jupyter kernel abort with no useful messages printed to the shell where jupyter notebook was started.
Repro:
- Create a new environment as ```conda create -n test dataprep notebook=6.1.5`` (Note the notebook version is required to get around the issue I mentioned in #396 )
- Start the notebook and run the code below
- After several seconds, a message displays 'The kernel appears to have died. It will restart automatically.'
import pandas as pd
import dataprep.eda as dp
df=pd.DataFrame({'Continuous': [1,2,3,4,5], 'Categorical': [0,1,1,0,0]})
df['Categorical'] = df['Categorical'].astype('category')
df.info()
dp.plot(df, 'Categorical')
@dhuntley1023 Thanks for the reporting! This sounds like an issue that happened in the C code, either pandas or dask. @brandonlockhart I think you are more familiar with the categorical support than I. Do you have any thoughts?
I cannot reproduce the error, can you @dovahcrow? However, I use Jupyter installed with pip, not conda. @dhuntley1023 could you please try .astype('object')
rather than .astype('category')
to convert a column to nominal values and see if that works?
When I use .astype('object')
, it works normally (i.e. no crashes). Would it be helpful to get a dump of the package versions in my environment?
Thanks for following up @dhuntley1023!
Would it be helpful to get a dump of the package versions in my environment?
Yes, that would be helpful, thanks.
I'm unsure why category
is causing a problem. Under-the-hood of dataprep we can convert category
to object
like we do for string type #377, but I think some more investigation would be useful to see if we can figure out the problem.
let's see with a newer release if this problem is fixed