mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

How to use large datasets? (Dask?)

Open fernaper opened this issue 4 years ago • 1 comments

Hi, I'm trying to train models with really large Datasets, up to 100Gb.

Is MLJAR integrated with Dask? if so, do you know how? If not, how can I parallelize or handle this kind of datasets?

Best regards

fernaper avatar Oct 19 '21 09:10 fernaper

Hi @fernaper,

MLJAR is not integrated with Dask. I've never used Dask. I don't know how to do the integration - will need to check.

What I do in such cases, is using machine with a lot of RAM (in the cloud). But first I would downsample the dataset to have proof of concept that ML works on the data.

pplonski avatar Oct 19 '21 11:10 pplonski