automlbenchmark icon indicating copy to clipboard operation
automlbenchmark copied to clipboard

TunedRF fails some large datasets under time constraints

Open PGijsbers opened this issue 3 years ago • 3 comments

Per @mfeurer in #337:

Alright, I can now pretty much reproduce the results except for a 4 exceptions: - helena: the inner process is killed without an error message (The only output in the logs is KILLED) - dionis: same - airlines: runs over the time limit. As I'm only running 1h, it probably works in the 4h setting - covertype: same

We should confirm this is an issue with not being able to do the evaluations in time. There's only so much we could do to fix it while keeping the baseline understandable. But we might use e.g. hold-out for evaluation on large datasets instead of 5-fold CV and/or use models trained during 5-fold CV directly instead of retraining at the end.

PGijsbers avatar Jul 15 '21 08:07 PGijsbers

we might use e.g. hold-out for evaluation on large datasets instead of 5-fold CV

sounds reasonable, we would probably also need to change the budget allocation in this case to ensure that the final model trained with the full dataset would still have enough time to complete: from 85/15 to maybe 50/50? or this could estimated on the first RF model trained (or the 3rd? using all features therefore the slowest).

and/or use models trained during 5-fold CV directly instead of retraining at the end

what do you mean here exactly? use the 5 CV models of the best max_features, compute predictions on the test dataset for each of them and apply some voting mechanism to obtain the final predictions?

sebhrusen avatar Jul 16 '21 16:07 sebhrusen

use the 5 CV models of the best max_features, compute predictions on the test dataset for each of them and apply some voting mechanism to obtain the final predictions?

Yes, using the average as voting scheme. Table 2 in Caruana et al. (2006)) suggests that this leads to better performance than a single model retrained on all data (MODSEL-BOTH-CV v. MODSEL-BOTH). It's nice because it doesn't require a refit, but it wouldn't help us if we do start using hold-out for large datasets.

PGijsbers avatar Jul 19 '21 08:07 PGijsbers

Just to note, this has been largely (though not entirely) fixed by https://github.com/openml/automlbenchmark/pull/441

PGijsbers avatar Feb 21 '22 19:02 PGijsbers