ydata-profiling icon indicating copy to clipboard operation
ydata-profiling copied to clipboard

Mixed column stops at "Get scatter matrix"

Open salmanea opened this issue 3 years ago • 5 comments

I have a dataset of about 250K rows and 28 columns. I run the profiler and it stops at 83% "Get scatter matrix" (see the attached screenshot) image

To Reproduce

Version information:

Additional context

salmanea avatar May 18 '21 13:05 salmanea

It's still computing (see *), just is slow for these 28x28 =784 plots. You can either turn this off or limit to 28xn where n = the number of target variables. See this page

sbrugman avatar May 18 '21 14:05 sbrugman

Thanks. I realized that it has a problem with a column that contained mixed letters and numbers. I dropped the column and it worked OK.

salmanea avatar May 19 '21 13:05 salmanea

I have same problem. My dataset have 5k record and 31 column. When profiling start process stucs at "Summarize dataset: 81% / Get scatter matrix". There is no mixed type column in dataset i changed all column types to float via astype(float) function of pandas. We want to use Pandas Profiling in our product but issues makes trouble at poc stage.

enesMesut avatar Jun 15 '21 09:06 enesMesut

@enesMesut Will be improved in the next version. For now either turn the scatterplots off (interactions={'continuous': False}) or select particular columns for which you're interested in obtaining them (interactions={'targets':['col1', 'col2']})

sbrugman avatar Jun 15 '21 09:06 sbrugman

@sbrugman Thanks for the fix. It worked. Please let us know when you found a complete solution to the problem. One more thing: Dates are not currently supported. It would be great if they can also be summarized in the profiling.

salmanea avatar Jun 17 '21 06:06 salmanea