auto-sklearn Incremental Learning or partial

Incremental Learning or partial_fit?

Open luckyhug opened this issue 3 years ago • 3 comments

Does auto-sklearn support incremental learning or partial_fit?

when the dataset is too big for the RAM, (About 230+GB, although I can store it in a list, there's not enough memory to convert the list to an np array)

Is there any advice or examples on dealing with this dataset?

Thank you very much!

Aug 17 '22 02:08 luckyhug

Hi @luckyhug,

No I think there's not much way to effectively use that much data in autosklearn natively. My only suggestion would be to run auto-sklearn on a subsample of that data and use show_models() to inform which models and hyperparameters to use for the next step of your pipeline in terms of which configurations to fit and handle the incremental learning and partial fitting in a custom manner.

Best, Eddie

Aug 17 '22 10:08 eddiebergman

@luckyhug did you consider down casting numerical values from e.g. float64 to float16 already? This could reduce your memory consumption by a factor of 4.

Sep 07 '22 22:09 jonaslandsgesell

@jonaslandsgesell good idea! We automatically do that already if the dataset is too large. We also automatically subsample the data if it's too large to fit in memory, but this means all of that original data can not be used then.

Sep 12 '22 10:09 eddiebergman

auto-sklearn auto-sklearn copied to clipboard

Incremental Learning or partial_fit?

auto-sklearn
auto-sklearn copied to clipboard