super-gradients
super-gradients copied to clipboard
How to resume training on the previous checkpoint
💡 Your Question
Hi Team, Thanks for giving us this masterpiece, I trained a model successfully using the tutorials you provided and now I've a model in ./checkpoints/exp1/ckpt_best.pth and I want to load this model and resume the training, I tried to use super_gradients.training.models.get(model_path) but it is expecting common model name aka, 'yolo_nas_l', can you help me?
Versions
No response
Set parameter "resume" with the same experiment_name in trainer initialization
train_params = {
"resume": True,
...
}
trainer.train(model=model,
training_params=train_params,
train_loader=train_data,
valid_loader=val_data)
Hello @binrey
I trained the model using this params:
train_params = {
# ENABLING SILENT MODE
"average_best_models":True,
"warmup_mode": "linear_epoch_step",
"warmup_initial_lr": 1e-6,
"lr_warmup_epochs": 3,
"initial_lr": 5e-4,
"lr_mode": "cosine",
"cosine_final_lr_ratio": 0.1,
"optimizer": "Adam",
"optimizer_params": {"weight_decay": 0.0001},
"zero_weight_decay_on_bias_and_bn": True,
"ema": True,
"ema_params": {"decay": 0.9, "decay_type": "threshold"},
# ONLY TRAINING FOR 10 EPOCHS FOR THIS EXAMPLE NOTEBOOK
"max_epochs": 100,
"mixed_precision": True,
"loss": PPYoloELoss(
use_static_assigner=False,
# NOTE: num_classes needs to be defined here
num_classes=config.NUM_CLASSES,
reg_max=16
),
"valid_metrics_list": [
DetectionMetrics_050(
score_thres=0.1,
top_k_predictions=300,
# NOTE: num_classes needs to be defined here
num_cls=config.NUM_CLASSES,
normalize_targets=True,
post_prediction_callback=PPYoloEPostPredictionCallback(
score_threshold=0.01,
nms_top_k=1000,
max_predictions=300,
nms_threshold=0.7
)
)
],
"metric_to_watch": '[email protected]'
}
The training has finished, but I think I need to train for another 50-100 epochs. I want to resume from the last_ckpt.pth. What should I do? Also, when I will start the training again what will happen to the learning rate, like how does it work?
Thank you so much, appreciate it.
@hassanbadawy is your issue fixed?
Hi Deci, Thank you for your support, it works well. Regards, Hasan
Sent from Outlook for Androidhttps://aka.ms/AAb9ysg
From: Minh-Tu Cao @.> Sent: Saturday, July 1, 2023 7:11:40 PM To: Deci-AI/super-gradients @.> Cc: Hassan Badawy @.>; Mention @.> Subject: Re: [Deci-AI/super-gradients] How to resume training on the previous checkpoint (Issue #1139)
@hassanbadawyhttps://github.com/hassanbadawy is your issue fixed?
— Reply to this email directly, view it on GitHubhttps://github.com/Deci-AI/super-gradients/issues/1139#issuecomment-1615977105, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK7KID47XIF3O7XF3JHKOHLXOBD3ZANCNFSM6AAAAAAY4BZBOY. You are receiving this because you were mentioned.Message ID: @.***>
Hi @binrey @hassanbadawy @Minh-Tu-Cao @AlimTleuliyev !
Thanks for coming to each other's aid on this issue. I'm gathering some feedback on SuperGradients and YOLO-NAS.
Would you be down for a quick call to chat about your experience?
If a call doesn't work for you, no worries. I've got a short survey you could fill out: https://bit.ly/sgyn-feedback.
I know you’re super busy, but your input will help us shape the direction of SuperGradients and make it as useful as possible for you.
I appreciate your time and feedback. Let me know what works for you.
Cheers,
Harpreet