Current performance evaluation objects, recently added to TunedModel histories, are too big
There's evidence that the recent addition of full PerformanceEvaluation objects to TunedModel histories is blowing up memory requirements in real use cases.
I propose that we create two PerformanceEvaluation objects - a detailed one (as we have now) and new CompactPerformanceEvaluation object. The evaluate method get's a new keyword argument compact=false and TunedModel gets a new hyperparameter compact_history=true (this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)
This would also allow us to ultimately address https://github.com/alan-turing-institute/MLJ.jl/issues/575, which was shelved for fear of making evaluation objects too big.
Further thoughts anyone?
cc @CameronBieganek, @OkonSamuel
Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold. This was always included in TunedModel histories previously, so it seems less disruptive to include it.
Fields
These fields are part of the public API of the PerformanceEvaluation struct.
-
[x]
model: model used to create the performance evaluation. In the case a tuning model, this is the best model found. -
[x]
measure: vector of measures (metrics) used to evaluate performance -
[x]
measurement: vector of measurements - one for each element ofmeasure- aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measuremisStatisticalMeasuresBase.external_aggregation_mode(m)(commonlyMean()orSum()) -
[x]
operation(e.g.,predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING. -
[x]
per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate. -
[x]
per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluatione,e.per_observation[m][f][i]is the measurement for theith observation in thefth test fold, evaluated using themth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measuremeasureis repeated across all observations in a fold ifStatisticalMeasures.can_report_unaggregated(measure) == true. Ifehas been computed with theper_observation=falseoption, thene_per_observationis a vector ofmissings. -
[ ]
fitted_params_per_fold: a vector containingfitted params(mach)for each machinemachtrained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event. -
[ ]
report_per_fold: a vector containingreport(mach)for each machinemachtraining in resampling - one machine per train/test pair. -
[ ]
train_test_rows: a vector of tuples, each of the form(train, test), wheretrainandtestare vectors of row (observation) indices for training and evaluation respectively. -
[x]
resampling: the resampling strategy used to generate the train/test pairs. -
[x]
repeats: the number of times the resampling strategy was repeated.
Also relevant: https://github.com/alan-turing-institute/MLJ.jl/issues/1025
To do:
- [ ] https://github.com/JuliaAI/MLJBase.jl/pull/973
- [ ] https://github.com/JuliaAI/MLJTuning.jl/pull/215