yolo3-keras
yolo3-keras copied to clipboard
请问训练时长一般是多少呢?
训练自己的数据集,一共321张图片,epoch=500,batch_size==8(10就会显示out of memory),
2020-09-30 09:26:59.030397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9484 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:07:00.0, compute capability: 6.1) 2020-09-30 09:26:59.033584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9484 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:08:00.0, compute capability: 6.1) 2020-09-30 09:26:59.036737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 9484 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:89:00.0, compute capability: 6.1) 2020-09-30 09:26:59.039634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 9484 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:8a:00.0, compute capability: 6.1)
这个报应该时4个gpu都用上了吧,为啥我都得10个小时左右才能训练完。
而且训练过程中的loss周期性的起伏
是什么原因呢?期待您的回答,谢谢!
你这500epoch……一个小时50Epoch,一个Epoch 1分钟都不到……很久吗
主要是我看别人训练一个gpu,也是500epoch,他5个小时就训练完了,给我整的很慌张,而且一跑这个都不能随便开别的软件,一开就out of memory,loss还起起伏伏的。。有没有啥办法能一边跑一边保存啊。。让我停在loss比较小的时候?
1、多gpu不一定比少gpu块 2、不本来就会保存么
谢谢大佬回复,我跑完了,模型也保存下来了(之前时因为工作站的电脑不归我一个人使,其他人跑一下我的代码就会显示gpu不够就停了,然后我就老得重新跑)。但是测试的时候一个boundingbox都没有输出orz,求问大佬这一般是啥情况?(我修改好了路径)
https://blog.csdn.net/weixin_44791964/article/details/107517428
好嘞,谢谢,我去检查一下