yolov5-deepsparse-blogpost icon indicating copy to clipboard operation
yolov5-deepsparse-blogpost copied to clipboard

how to do resume of training? should i need to train for 300 epoch to get quantised ,model?

Open akashAD98 opened this issue 2 years ago • 4 comments

doing resume=True in train.py solved the problem

akashAD98 avatar Jun 23 '22 11:06 akashAD98

should i need to complete all 300 epoch training to get a quantized model? bcz I'm getting an error when i try to export 150 epoch model, & when i resume this training, it adds extra epochs. here you can see i tried it for 300 epochs but when i stopped training & started resuming training its showed 389 epochs

image

akashAD98 avatar Jun 23 '22 12:06 akashAD98

getting this issue while converting to onnx, whats wrong here? should i need to do continuous training without stop?

image

akashAD98 avatar Jun 24 '22 06:06 akashAD98

even training is not completed ,only 2 epochs remaining ,its giving cuda out of memory ,

image image

akashAD98 avatar Jun 24 '22 08:06 akashAD98

Hi @akashAD98

The quantization only happens at the last 2 epochs of the training. This is specified in the recipe file. So if you halt the training before the quantization epoch, you will not get a quantized model.

The screenshot below shows where you can change the quantization epoch.

image

num_epochs is the total number of training epochs.

quantization_start_epoch is the exact epoch where quantization begins.

dnth avatar Jun 24 '22 08:06 dnth