jacinto-ai-devkit icon indicating copy to clipboard operation
jacinto-ai-devkit copied to clipboard

caffe-jacinto training memory problem

Open KurtKoo opened this issue 4 years ago • 2 comments

Hello! I try to train ssd using the caffe-jacinto with a Geforce 940MX GPU(2002MB available).

At first, I ran the training script(voc0712, 256*256) with batch_size == 16 and i failed. The training log said the gpu cannot allocate enough memory. Then I trained on a much smaller dataset and it failed either.

However, it can train with batch_size == 2 on both datasets. Is there a solution that it can train with batch_size == 16?

Thanks!

KurtKoo avatar Jun 04 '21 13:06 KurtKoo

2GB GPU memory is quite less. But you can try to use fp16 for training - that might help you to double the batch size. Try adding the following to your config and try to train:

fp16 = dict(loss_sclae=512.)

mathmanu avatar Jun 04 '21 13:06 mathmanu

2GB GPU memory is quite less. But you can try to use fp16 for training - that might help you to double the batch size. Try adding the following to your config and try to train:

fp16 = dict(loss_sclae=512.)

Thanks for your advice!

However, I'm still not familiar with the caffe-jacinto project. Could you please tell me which 'config' file you exactly mean?

KurtKoo avatar Jun 06 '21 13:06 KurtKoo