sweetviz icon indicating copy to clipboard operation
sweetviz copied to clipboard

support big data use cases

Open yair4Data opened this issue 5 years ago • 3 comments

great library ! it will be great if there was support on big data use cases (integration with dask/ vaex/spark) my use case has out of memory data set size and great imbalace so if i want to keep original target ratio - i need to support original data size and not down sample the data.

yair4Data avatar Feb 15 '21 09:02 yair4Data

Hello @yair4Data, thank you for the kind words! I hope the library can be useful to you!

Are you saying you are running out of memory one converting to a pandas data frame (e.g. df = df.compute() in dask)?

Or are you getting an error message when running the report, or generating HTML?

fbdesignpro avatar Feb 15 '21 22:02 fbdesignpro

t have the same probleam too,my data have about billion rows, but it does not work! can use the modin package?

haiyuni avatar Mar 12 '21 02:03 haiyuni

@haiyuni I haven't looked at modin, I will do so and get back here.

Regarding the billion row issue, I am assuming you are referring to the scale issue (#73)? Or is there a specific error I should be looking at?

Thanks again!

fbdesignpro avatar Mar 12 '21 17:03 fbdesignpro