ignite
ignite copied to clipboard
Sync trainer state with evaluators
🚀 Feature
There can be use-cases when we would like to get trainer's epoch/iteration or/and other items from trainer.state
. Let's propose an API such that we could get easily trainer's state from evaluator.
Context : https://discuss.pytorch.org/t/get-current-epoch-inside-process-function-of-evaluator/162926
Many handlers/metrics provide a global_step_transform
as an argument to get the steps it wants.
Can I work on this? I am pretty new to this
@jalajk24 right now it is still under discussions whether we need to work on something here. Do you have any ideas or suggestions on the topic ?
I am proposing a new API function for Engine
class that can fetch the epoch from an instance of trainer.
It can work in this way. This can also return the current trainer epoch
def fetch_trainer_epoch(trainer: Engine):
epoch = trainer.state.epoch
self.state.trainer_epoch = epoch
return epoch
@vfdev-5 does this makes sense?
It can be called like optimizer.step()
The core question of the issue is whether to abstract a trainer
in ignite. It's not a good idea from what I know of ignite, or at least the core of it.
Hey @louis-she ,I guess the API can be helpful to compare the performances of two or more different training methods, also it can help in training of ensemble models. I have been working in the space of the GANs and adversarial training and I have noticed that sometimes you need to combine two training methods to get better results, so this may be a helpful addition in the Engine
class
@guptaaryan16 can you please give a concrete example of what you are talking about ?
Sure @vfdev-5 , I think it will be mostly useful for hyperparameter tuning and testing of variation of results to make the training easier; like reducing the number of epochs and testing the different training methods.
For instance, I can share a small thing happened when I was training a model using Cifar-10 and Gaussian Augmentation training(https://arxiv.org/abs/1902.02918) to measure the Average Certified Radius(ACR) of the model using Randomized smoothing. There I noticed that if I included a PGD adversarial training(https://arxiv.org/pdf/1706.06083.pdf) in addition to the Gaussian Augmentation training I can get a very high ACR, but to get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results. So it may be helpful to have this API but you can also get the specific epoch without having this .
@guptaaryan16 thanks for details but I was wondering more about code details. Can you provide some code to highlight your idea. As for HP tuning and multiple experiments, you can check
- HP tuning tutorial: https://github.com/pytorch/ignite/blob/master/examples/notebooks/Cifar10_Ax_hyperparam_tuning.ipynb
- Experiment tracking e.g. with ClearML: https://pytorch-ignite.ai/how-to-guides/10-loggers/
get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results.
I think there is nothing impossible here. I imagine that you have a handler to run validation:
best_acr = 0.0
def run_validation():
evaluator.run(val_data)
metrics = evaluator.state.metrics
if metrics["ACR"] > best_acr:
best_acr = metrics["ACR"]
current_epoch = trainer.state.epoch
# save locally a bundle:
fp = f"/path/to/output/{current_epoch}_best_acr.pt"
torch.save({
"best_acr": best_acr,
"epoch": current_epoch,
"model": model.state_dict(),
...
})
yes @vfdev-5 I do not have the specific code for that but I can imagine that it was written along the same lines(that project did not use ignite )
Also I was thinking about can we access the epochs directly instead using the trainer.state.epoch
to trainer.epoch
as it can make a bit more sense because I don't think we can have different states within the same trainer anyways