mljar-supervised
mljar-supervised copied to clipboard
How to use large datasets? (Dask?)
Hi, I'm trying to train models with really large Datasets, up to 100Gb.
Is MLJAR integrated with Dask? if so, do you know how? If not, how can I parallelize or handle this kind of datasets?
Best regards
Hi @fernaper,
MLJAR is not integrated with Dask. I've never used Dask. I don't know how to do the integration - will need to check.
What I do in such cases, is using machine with a lot of RAM (in the cloud). But first I would downsample the dataset to have proof of concept that ML works on the data.