dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Jupyter kernel abort when plotting a column with pandas type "category"

Open dhuntley1023 opened this issue 4 years ago • 5 comments

I ran into the issue documented in Issue #230 and was thinking that if I marked the column as dtype=category in pandas that the dataprep plot would use that as a hint that the columns was nominal and not continuous. However when I do this, it is reliably causing a Jupyter kernel abort with no useful messages printed to the shell where jupyter notebook was started.

Repro:

  • Create a new environment as ```conda create -n test dataprep notebook=6.1.5`` (Note the notebook version is required to get around the issue I mentioned in #396 )
  • Start the notebook and run the code below
  • After several seconds, a message displays 'The kernel appears to have died. It will restart automatically.'
import pandas as pd
import dataprep.eda as dp

df=pd.DataFrame({'Continuous': [1,2,3,4,5], 'Categorical': [0,1,1,0,0]})
df['Categorical'] = df['Categorical'].astype('category')
df.info()
dp.plot(df, 'Categorical')

dhuntley1023 avatar Dec 30 '20 01:12 dhuntley1023

@dhuntley1023 Thanks for the reporting! This sounds like an issue that happened in the C code, either pandas or dask. @brandonlockhart I think you are more familiar with the categorical support than I. Do you have any thoughts?

dovahcrow avatar Dec 30 '20 01:12 dovahcrow

I cannot reproduce the error, can you @dovahcrow? However, I use Jupyter installed with pip, not conda. @dhuntley1023 could you please try .astype('object') rather than .astype('category') to convert a column to nominal values and see if that works?

brandonlockhart avatar Dec 30 '20 02:12 brandonlockhart

When I use .astype('object'), it works normally (i.e. no crashes). Would it be helpful to get a dump of the package versions in my environment?

dhuntley1023 avatar Dec 30 '20 04:12 dhuntley1023

Thanks for following up @dhuntley1023!

Would it be helpful to get a dump of the package versions in my environment?

Yes, that would be helpful, thanks.

I'm unsure why category is causing a problem. Under-the-hood of dataprep we can convert category to object like we do for string type #377, but I think some more investigation would be useful to see if we can figure out the problem.

brandonlockhart avatar Dec 30 '20 08:12 brandonlockhart

let's see with a newer release if this problem is fixed

dovahcrow avatar Jan 05 '21 23:01 dovahcrow