mindocr
mindocr copied to clipboard
添加断点续训、checkpoint保存、训练日志保存三种功能,丰富Loss输出信息,边训边验适配eval_start_epoch和eval_interval
修改点描述:
断点续训:能从中断的epoch继续训练
checkpoint保存:每个epoch结束保存ckpt
日志保存:
训练场景用logger模块代替Prints输出日志且保存日志文件。
Loss输出非下沉场景(可设置打印step间隔):
2023-02-17 11:43:14,405:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 1.487435, per step time: 3096.616 ms, fps: 16.54 img/s
2023-02-17 11:43:37,711:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 1.251532, per step time: 751.433 ms, fps: 14.97 img/s
2023-02-17 11:44:41,012:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 1.079233, per step time: 481.662 ms, fps: 16.00 img/s
2023-02-17 11:43:44,326:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 0.981760, per step time: 462.071 ms, fps: 15.39 img/s
2023-02-17 11:43:47,646:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 0.898887, per step time: 426.740 ms, fps: 14.89 img/s
2023-02-17 11:43:50,943:INFO:epoch: [1/10] step: [10/85], lr: 0.001000, loss: 0.803308, per step time: 450.666 ms, fps: 16.70 img/s
数据下沉场景:
2023-02-17 14:27:29,405:INFO:epoch: [1/90] loss: 1.082604, epoch time: 40.559 s, per step time: 207.995 ms, fps: 16.54 img/s
2023-02-17 14:27:31,711:INFO:epoch: [2/90] loss: 1.045892, epoch time: 2.413 s, per step time: 12.377 ms, fps: 14.97 img/s
2023-02-17 14:27:34,012:INFO:epoch: [3/90] loss: 0.729006, epoch time: 2.486 s, per step time: 12.750 ms, fps: 16.00 img/s
2023-02-17 14:27:36,326:INFO:epoch: [4/90] loss: 0.766412, epoch time: 2.443 s, per step time: 12.529 ms, fps: 15.39 img/s
2023-02-17 14:27:39,646:INFO:epoch: [5/90] loss: 0.655058, epoch time: 2.851 s, per step time: 14.621 ms, fps: 16.70 img/s
边训边验:可根据需求选择从第几个epoch开始验证,间隔几个epoch验证一次。
Thanks.
checkpoint保存:每个epoch结束保存ckpt。 这个可选last_k 或者top_k保存策略。