ignite
ignite copied to clipboard
Improvements for resuming from terminated engine
Current Engine's behaviour when we resume the run from terminated state is the following:
from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging
def func(engine, batch):
print(engine.state.epoch, engine.state.iteration, " | ", batch)
max_epochs = 4
data = range(10)
engine = Engine(func)
engine.logger = setup_logger("engine", level=logging.DEBUG)
@engine.on(Events.ITERATION_COMPLETED(once=14))
def terminate():
print("-> terminate")
engine.terminate()
engine.run(data, max_epochs=max_epochs)
engine.run(data, max_epochs=max_epochs)
print(engine.state.epoch, engine.state.iteration)
Output:
2022-08-23 15:42:07,681 engine INFO: Engine run starting with max_epochs=4.
2022-08-23 15:42:07,682 engine DEBUG: 0 | 0, Firing handlers for event Events.STARTED
2022-08-23 15:42:07,683 engine DEBUG: 1 | 0, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,683 engine DEBUG: 1 | 0, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,685 engine DEBUG: 1 | 0, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,687 engine DEBUG: 1 | 1, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,688 engine DEBUG: 1 | 1, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,688 engine DEBUG: 1 | 1, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,689 engine DEBUG: 1 | 1, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,690 engine DEBUG: 1 | 2, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,691 engine DEBUG: 1 | 2, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,692 engine DEBUG: 1 | 2, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,692 engine DEBUG: 1 | 2, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,693 engine DEBUG: 1 | 3, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,693 engine DEBUG: 1 | 3, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,695 engine DEBUG: 1 | 3, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,696 engine DEBUG: 1 | 3, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,697 engine DEBUG: 1 | 4, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,698 engine DEBUG: 1 | 4, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,698 engine DEBUG: 1 | 4, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,700 engine DEBUG: 1 | 4, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,701 engine DEBUG: 1 | 5, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,702 engine DEBUG: 1 | 5, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,703 engine DEBUG: 1 | 5, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,704 engine DEBUG: 1 | 5, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,706 engine DEBUG: 1 | 6, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,707 engine DEBUG: 1 | 6, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,708 engine DEBUG: 1 | 6, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,708 engine DEBUG: 1 | 6, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,709 engine DEBUG: 1 | 7, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,711 engine DEBUG: 1 | 7, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,712 engine DEBUG: 1 | 7, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,714 engine DEBUG: 1 | 7, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,714 engine DEBUG: 1 | 8, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,716 engine DEBUG: 1 | 8, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,717 engine DEBUG: 1 | 8, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,717 engine DEBUG: 1 | 8, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,719 engine DEBUG: 1 | 9, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,719 engine DEBUG: 1 | 9, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,720 engine DEBUG: 1 | 9, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,720 engine DEBUG: 1 | 9, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,721 engine DEBUG: 1 | 10, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,722 engine DEBUG: 1 | 10, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,722 engine DEBUG: 1 | 10, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,723 engine INFO: Epoch[1] Complete. Time taken: 00:00:00.039
2022-08-23 15:42:07,723 engine DEBUG: 2 | 10, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,724 engine DEBUG: 2 | 10, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,724 engine DEBUG: 2 | 10, Firing handlers for event Events.DATALOADER_STOP_ITERATION
2022-08-23 15:42:07,725 engine DEBUG: 2 | 10, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,725 engine DEBUG: 2 | 11, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,726 engine DEBUG: 2 | 11, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,727 engine DEBUG: 2 | 11, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,727 engine DEBUG: 2 | 11, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,728 engine DEBUG: 2 | 12, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,728 engine DEBUG: 2 | 12, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,729 engine DEBUG: 2 | 12, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,729 engine DEBUG: 2 | 12, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,730 engine DEBUG: 2 | 13, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,733 engine DEBUG: 2 | 13, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,734 engine DEBUG: 2 | 13, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,734 engine DEBUG: 2 | 13, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,735 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,736 engine DEBUG: 2 | 14, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,737 engine INFO: Terminate signaled. Engine will stop after current iteration is finished.
2022-08-23 15:42:07,737 engine DEBUG: 2 | 14, Firing handlers for event Events.TERMINATE_SINGLE_EPOCH
2022-08-23 15:42:07,738 engine DEBUG: 2 | 14, Firing handlers for event Events.TERMINATE
2022-08-23 15:42:07,739 engine INFO: Epoch[2] Complete. Time taken: 00:00:00.015
2022-08-23 15:42:07,739 engine DEBUG: 2 | 14, Firing handlers for event Events.COMPLETED
2022-08-23 15:42:07,740 engine INFO: Engine run complete. Time taken: 00:00:00.058
2022-08-23 15:42:07,741 engine INFO: Engine run resuming from iteration 14, epoch 2 until 4 epochs
2022-08-23 15:42:07,741 engine DEBUG: 2 | 14, Firing handlers for event Events.STARTED
2022-08-23 15:42:07,742 engine DEBUG: 3 | 14, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,743 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,743 engine DEBUG: 3 | 14, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,744 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,745 engine DEBUG: 3 | 15, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,746 engine DEBUG: 3 | 15, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,746 engine DEBUG: 3 | 15, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,747 engine DEBUG: 3 | 16, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,748 engine DEBUG: 3 | 16, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,748 engine DEBUG: 3 | 16, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,749 engine DEBUG: 3 | 16, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,749 engine DEBUG: 3 | 17, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,750 engine DEBUG: 3 | 17, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,750 engine DEBUG: 3 | 17, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,751 engine DEBUG: 3 | 17, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,751 engine DEBUG: 3 | 18, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,754 engine DEBUG: 3 | 18, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,755 engine DEBUG: 3 | 18, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,755 engine DEBUG: 3 | 18, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,756 engine DEBUG: 3 | 19, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,757 engine DEBUG: 3 | 19, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,758 engine DEBUG: 3 | 19, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,758 engine DEBUG: 3 | 19, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,759 engine DEBUG: 3 | 20, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,759 engine DEBUG: 3 | 20, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,761 engine DEBUG: 3 | 20, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,761 engine INFO: Epoch[3] Complete. Time taken: 00:00:00.018
2022-08-23 15:42:07,762 engine DEBUG: 4 | 20, Firing handlers for event Events.EPOCH_STARTED
2022-08-23 15:42:07,762 engine DEBUG: 4 | 20, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,763 engine DEBUG: 4 | 20, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,763 engine DEBUG: 4 | 21, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,764 engine DEBUG: 4 | 21, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,764 engine DEBUG: 4 | 21, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,765 engine DEBUG: 4 | 21, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,766 engine DEBUG: 4 | 22, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,766 engine DEBUG: 4 | 22, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,767 engine DEBUG: 4 | 22, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,768 engine DEBUG: 4 | 22, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,769 engine DEBUG: 4 | 23, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,769 engine DEBUG: 4 | 23, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,770 engine DEBUG: 4 | 23, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,771 engine DEBUG: 4 | 23, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,771 engine DEBUG: 4 | 24, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,772 engine DEBUG: 4 | 24, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,773 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,774 engine DEBUG: 4 | 24, Firing handlers for event Events.DATALOADER_STOP_ITERATION
2022-08-23 15:42:07,774 engine DEBUG: 4 | 24, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,775 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,776 engine DEBUG: 4 | 25, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,776 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,777 engine DEBUG: 4 | 25, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,778 engine DEBUG: 4 | 26, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,779 engine DEBUG: 4 | 26, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,780 engine DEBUG: 4 | 26, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,781 engine DEBUG: 4 | 26, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,781 engine DEBUG: 4 | 27, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,782 engine DEBUG: 4 | 27, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,783 engine DEBUG: 4 | 27, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,783 engine DEBUG: 4 | 27, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,784 engine DEBUG: 4 | 28, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,784 engine DEBUG: 4 | 28, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,785 engine DEBUG: 4 | 28, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,785 engine DEBUG: 4 | 28, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,786 engine DEBUG: 4 | 29, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,787 engine DEBUG: 4 | 29, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,787 engine DEBUG: 4 | 29, Firing handlers for event Events.GET_BATCH_STARTED
2022-08-23 15:42:07,788 engine DEBUG: 4 | 29, Firing handlers for event Events.GET_BATCH_COMPLETED
2022-08-23 15:42:07,788 engine DEBUG: 4 | 30, Firing handlers for event Events.ITERATION_STARTED
2022-08-23 15:42:07,789 engine DEBUG: 4 | 30, Firing handlers for event Events.ITERATION_COMPLETED
2022-08-23 15:42:07,790 engine DEBUG: 4 | 30, Firing handlers for event Events.EPOCH_COMPLETED
2022-08-23 15:42:07,791 engine INFO: Epoch[4] Complete. Time taken: 00:00:00.029
2022-08-23 15:42:07,791 engine DEBUG: 4 | 30, Firing handlers for event Events.COMPLETED
2022-08-23 15:42:07,792 engine INFO: Engine run complete. Time taken: 00:00:00.051
1 1 | 0
1 2 | 1
1 3 | 2
1 4 | 3
1 5 | 4
1 6 | 5
1 7 | 6
1 8 | 7
1 9 | 8
1 10 | 9
2 11 | 0
2 12 | 1
2 13 | 2
2 14 | 3
-> terminate
3 15 | 0
3 16 | 1
3 17 | 2
3 18 | 3
3 19 | 4
3 20 | 5
4 21 | 6
4 22 | 7
4 23 | 8
4 24 | 9
4 25 | 0
4 26 | 1
4 27 | 2
4 28 | 3
4 29 | 4
4 30 | 5
(4, 30)
We can see that the engine stopped at epoch 2 and when engine is resumed, epochs are counted from 3. The epoch 3 does not count 10 iterations and ends earlier (after 6 iterations).
Maybe, an expected behaviour while resuming could be to count epochs from 2 and count 10 iterations before switching to the epoch 4 ?
@sadra-barikbin what do you think ?
EDIT: a suggestion from https://github.com/pytorch/ignite/pull/2645#discussion_r952675098
It's due to iter_counter local variable (in run_once_on_dataset) which has been initialized as state.iteration % epoch_length (3) and causes engine to stop 7 batches later when iter_counter == epoch_length.
Hence, iter_counter should start with zero.
I agree with Sadra's comment here: https://github.com/pytorch/ignite/pull/2645#discussion_r952231156
that final iteration should be 34 == epoch_length * (max_epochs - 1) + (iteration_to_stop % epoch_length)