yolo3-keras icon indicating copy to clipboard operation
yolo3-keras copied to clipboard

请问训练时长一般是多少呢?

Open Sherlock-hh opened this issue 5 years ago • 6 comments

训练自己的数据集,一共321张图片,epoch=500,batch_size==8(10就会显示out of memory), 2020-09-30 09:26:59.030397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9484 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:07:00.0, compute capability: 6.1) 2020-09-30 09:26:59.033584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9484 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:08:00.0, compute capability: 6.1) 2020-09-30 09:26:59.036737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 9484 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:89:00.0, compute capability: 6.1) 2020-09-30 09:26:59.039634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 9484 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:8a:00.0, compute capability: 6.1) 这个报应该时4个gpu都用上了吧,为啥我都得10个小时左右才能训练完。 而且训练过程中的loss周期性的起伏 image 是什么原因呢?期待您的回答,谢谢!

Sherlock-hh avatar Sep 30 '20 09:09 Sherlock-hh

你这500epoch……一个小时50Epoch,一个Epoch 1分钟都不到……很久吗

bubbliiiing avatar Oct 09 '20 01:10 bubbliiiing

主要是我看别人训练一个gpu,也是500epoch,他5个小时就训练完了,给我整的很慌张,而且一跑这个都不能随便开别的软件,一开就out of memory,loss还起起伏伏的。。有没有啥办法能一边跑一边保存啊。。让我停在loss比较小的时候?

Sherlock-hh avatar Oct 09 '20 01:10 Sherlock-hh

1、多gpu不一定比少gpu块 2、不本来就会保存么

bubbliiiing avatar Oct 16 '20 05:10 bubbliiiing

谢谢大佬回复,我跑完了,模型也保存下来了(之前时因为工作站的电脑不归我一个人使,其他人跑一下我的代码就会显示gpu不够就停了,然后我就老得重新跑)。但是测试的时候一个boundingbox都没有输出orz,求问大佬这一般是啥情况?(我修改好了路径)

Sherlock-hh avatar Oct 16 '20 05:10 Sherlock-hh

https://blog.csdn.net/weixin_44791964/article/details/107517428

bubbliiiing avatar Oct 16 '20 06:10 bubbliiiing

好嘞,谢谢,我去检查一下

Sherlock-hh avatar Oct 16 '20 06:10 Sherlock-hh