yolo3-keras 请问训练时长一般是多少呢？

训练自己的数据集，一共321张图片，epoch=500，batch_size==8（10就会显示out of memory）， 2020-09-30 09:26:59.030397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9484 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:07:00.0, compute capability: 6.1) 2020-09-30 09:26:59.033584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9484 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:08:00.0, compute capability: 6.1) 2020-09-30 09:26:59.036737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 9484 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:89:00.0, compute capability: 6.1) 2020-09-30 09:26:59.039634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 9484 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:8a:00.0, compute capability: 6.1) 这个报应该时4个gpu都用上了吧，为啥我都得10个小时左右才能训练完。而且训练过程中的loss周期性的起伏是什么原因呢？期待您的回答，谢谢！

Sep 30 '20 09:09 Sherlock-hh

你这500epoch……一个小时50Epoch，一个Epoch 1分钟都不到……很久吗

Oct 09 '20 01:10 bubbliiiing

主要是我看别人训练一个gpu，也是500epoch，他5个小时就训练完了，给我整的很慌张，而且一跑这个都不能随便开别的软件，一开就out of memory，loss还起起伏伏的。。有没有啥办法能一边跑一边保存啊。。让我停在loss比较小的时候？

Oct 09 '20 01:10 Sherlock-hh

1、多gpu不一定比少gpu块 2、不本来就会保存么

Oct 16 '20 05:10 bubbliiiing

谢谢大佬回复，我跑完了，模型也保存下来了（之前时因为工作站的电脑不归我一个人使，其他人跑一下我的代码就会显示gpu不够就停了，然后我就老得重新跑）。但是测试的时候一个boundingbox都没有输出orz，求问大佬这一般是啥情况？（我修改好了路径）

Oct 16 '20 05:10 Sherlock-hh

https://blog.csdn.net/weixin_44791964/article/details/107517428

Oct 16 '20 06:10 bubbliiiing

好嘞，谢谢，我去检查一下

Oct 16 '20 06:10 Sherlock-hh

yolo3-keras yolo3-keras copied to clipboard

请问训练时长一般是多少呢？

yolo3-keras
yolo3-keras copied to clipboard