transformers
transformers copied to clipboard
Epoch is zero is updated in epoch 2 instead of 1 twice in the Callback
System Info
Google Colab, transformers 4.26
Who can help?
@sta
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
from transformers import TrainerCallback, TrainingArguments,TrainerControl,TrainerState
from typing import Dict, Any
class DebugCallback(TrainerCallback):
def on_train_begin(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs: Dict,
) -> None:
print("training start")
def on_evaluate(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs: Dict,
) -> None:
print("valiidation")
def on_epoch_begin(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs: Dict,
) -> None:
self.state = state
print("epoch", state.epoch)
print("train" )
def on_epoch_end(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs: Dict,
) -> None:
print("val")
print("epoch", state.epoch)
def on_train_end(
self,
args: TrainingArguments,
state: TrainerState,
control: TrainerControl,
**kwargs: Dict,
) -> None:
print("test")
dc= DebugCallback()
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=food["train"].remove_columns("id"),
eval_dataset=food["test"].remove_columns("id"),
tokenizer=image_processor,
compute_metrics=compute_metrics,
callbacks=[dc]
)
Expected behavior
Expected outcome:
epoch 0
epoch 1
epoch 2
actual outcome.
epoch 0
train
val
epoch 0.992
valiidation
epoch 0
train
val
epoch 1.992
valiidation
epoch 1
train```
So epoch 0 gets logged twice?
Hey, I'm a research engineer working on language modelling wanting to contribute to open source. I was wondering if I could give it a shot?
@jackapbutler go for it :)
Hey @franz101, I'm unable to reproduce the issue you've highlighted, do you have access to the TrainingArguments
that you used for this experiment?
When I run the code I get the following expected output;
[I 230214 16:26:49 yo:27] training start
[I 230214 16:26:49 yo:46] train
[I 230214 16:26:49 yo:47] epoch 0
[I 230214 16:27:06 yo:56] val
[I 230214 16:27:06 yo:57] epoch 0.96
[I 230214 16:27:12 yo:36] valiidation
[I 230214 16:27:13 yo:46] train
[I 230214 16:27:13 yo:47] epoch 0.96
[I 230214 16:27:29 yo:56] val
[I 230214 16:27:29 yo:57] epoch 1.96
[I 230214 16:27:36 yo:36] valiidation
[I 230214 16:27:36 yo:46] train
[I 230214 16:27:36 yo:47] epoch 1.96
[I 230214 16:27:53 yo:56] val
[I 230214 16:27:53 yo:57] epoch 2.96
[I 230214 16:28:00 yo:36] valiidation
[I 230214 16:28:00 yo:66] test
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.