[ENH] Validation loss for deep learning clustering
Describe the feature or idea you want to propose
I am testing the different AutoEncoders implemented in aeon. I have noticed that the models are only storing the training loss, accessible for plotting/inspection in model.summary(). However, there are no validation losses.
Describe your proposed solution
I made a small addition in the self.training_model_.fit() in
https://github.com/aeon-toolkit/aeon/blob/0412d5b50272f58efbcd989ffe18bb46d6378965/aeon/clustering/deep_learning/_ae_abgru.py#L284-L292
where only a new argument of validation_split = int [0,1] is needed according to keras api https://keras.io/api/models/model_training_apis/
I did this on my experiments and it seems to work fine. I would consider this enhancement very useful, as naive users might go for the 2000 n_epochs default run and end up overfitting all their models, and then the manifold learning and subsequent clustering makes no much sense
I would be happy to create a PR for most, if not all, the models in the deep learning clustering module. Basically, it would be a new argument for the initialization of the clusterer classes (validation_split).
I am new to Keras, so please let me know if I overlooked any easier way to do this.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
Another open question on my side is on how the save_best_model works. Its creating some callbacks.modelCheckpoint(..., monitor = "loss", save_best_only = True)
So the best model is defined as the one with min(loss)? I would switch this to a validation loss criteria instead, as it is overlooking the issue of overfitting.
Hello, thank you for raising the enhancement proposal. We've been asked a lot before if all deep learning models do support validation split, in classification regression and clustering, its always been on our to do list but still havent done it, am happy to have this functionality, although to make it clear because you also mentioned the save_best_model thing:
We follow this rule: a deep learning model can be trained to infinite amount of epochs, making it random if we use the "last model" for evaluation in predict. In practice we use the "best model" for evaluation. However the "best model" can be chosen either on the validation loss if a validation split is provided and on the training loss if not. By default in aeon the ModelCheckpoint saves to file the best model using the training loss. Now if save_best_model is set to true, the keras file will be kept in file after fit is done, if its false then the file will be deleted (see docs of any deep model in aeon). The saving directory is the file_path parameter and the name of the file will be best_file_name parameter with a default added .keras extension.
So if you want to add the validation spit parameter am more than happy to help review the PR however the following should be done:
- A new parameter called
validation_splitdefault to0.0, explained in docs similar to its docs in keras (see here) - Mention in the docs that if validation_split is > 0.0 then the "val_loss" is used in modelcheckpoint
- In the code where modelCheckpoint is being called, there should be an if else statement, if
validation_splitis zero then use "loss" to "monitor" and if its >0.0 then use "val_loss" to "monitor" with the rest of the functionality kept the same
Also there should be a new test just for this functionality to make sure its working, under aeon/clustering/deep_learning/tests with a new file called test_deep_clustering_validation.py
Am happy to get this in for now into clustering only, and once we have it and tested then we get this in for classification and regression