MLJ.jl icon indicating copy to clipboard operation
MLJ.jl copied to clipboard

Current performance evaluation objects, recently added to TunedModel histories, are too big

Open ablaom opened this issue 1 year ago • 2 comments

There's evidence that the recent addition of full PerformanceEvaluation objects to TunedModel histories is blowing up memory requirements in real use cases.

I propose that we create two PerformanceEvaluation objects - a detailed one (as we have now) and new CompactPerformanceEvaluation object. The evaluate method get's a new keyword argument compact=false and TunedModel gets a new hyperparameter compact_history=true (this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)

This would also allow us to ultimately address https://github.com/alan-turing-institute/MLJ.jl/issues/575, which was shelved for fear of making evaluation objects too big.

Further thoughts anyone?

cc @CameronBieganek, @OkonSamuel

Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold. This was always included in TunedModel histories previously, so it seems less disruptive to include it.

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • [x] model: model used to create the performance evaluation. In the case a tuning model, this is the best model found.

  • [x] measure: vector of measures (metrics) used to evaluate performance

  • [x] measurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • [x] operation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING.

  • [x] per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.

  • [x] per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.

  • [ ] fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.

  • [ ] report_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.

  • [ ] train_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.

  • [x] resampling: the resampling strategy used to generate the train/test pairs.

  • [x] repeats: the number of times the resampling strategy was repeated.

ablaom avatar Apr 17 '24 23:04 ablaom

Also relevant: https://github.com/alan-turing-institute/MLJ.jl/issues/1025

ablaom avatar Apr 17 '24 23:04 ablaom

To do:

  • [ ] https://github.com/JuliaAI/MLJBase.jl/pull/973
  • [ ] https://github.com/JuliaAI/MLJTuning.jl/pull/215

ablaom avatar Apr 24 '24 02:04 ablaom