Schro Heaven
Schro Heaven
你好,我在进行多GPU训练的时候出现了以下问题: 1、通过设置os.environ["CUDA_VISIBLE_DEVICES"]="4,5,6,7"指定GPU可见的驱动编号, 但是总是第一块gpu被调用(指定的第一块,这里编号为4) 2、如果不进行指定可见的GPU,则会将所有GPU都默认加载(一共8块GPU),但是只有第一块显存被加载满,并参与运算,其他的GPU并不参与运算。 3、CPU占用也特别严重,32核都被沾满了(每个核几乎都是100%)。 4、一块GPU训练很慢,不知道是我代码的问题不是,GPU已经跑满,看着也在高位运行,但是CPU调用也是100%。一块1080Ti的GPU训练coco2017数据集超过12小时,(我修改了batch_size为16,learning_rate初始值为1e-4) 5、我使用了TensorFlow官网提供的多GPU样例,如果使用os.environ["CUDA_VISIBLE_DEVICES"]指定可见的编号,也会出现只有制定GPU第一个编号的被调用。但是如果不指定可见,则8块GPU会全部加载并全部参与运算; 多GPU设置代码截图: data:image/s3,"s3://crabby-images/39cb8/39cb81ee73beabee2da1e284234099f2418923f5" alt="123" nvidia-smi GPU调用情况展示: data:image/s3,"s3://crabby-images/fe17c/fe17cf063c8b419f117e797e69061f284edfb4e7" alt="456" 运算情况截图: data:image/s3,"s3://crabby-images/b4e34/b4e345f1fc2e84b5002d048b33df4c94ea1939bd" alt="789" 另外在最后一轮结束时报错:截图如下 data:image/s3,"s3://crabby-images/323e0/323e08b232b9961d27a1fccda74976d677d9dab0" alt="4444" 我的理解这个错误应该是训练终止,所以报的错
我在代码上训练了D0和D3,但是准确率都只有0.2?数据集用的是VOC2012+2017. 您提供的D0的预训练模型,我测试识别人,但是很多的人都无法识别。 data:image/s3,"s3://crabby-images/b4028/b4028ef6f667d6135485419d5cff9d7b4d416a3d" alt="1 (3)"
Thank you for your work. I have some questions about your project code: I looked at the model you called YoloV5 in the middle, but did not see the operation...
I have a few questions. 1、Why the training and test is divided into the following two steps and then fusion?I saw the data only train_bone should only be the bone...
I reported the following error after deployment. Do you have any Suggestions to solve it? Traceback (most recent call last): File "/home/basic/workspace/Scheaven/05_cpn/keras_cpn/preprocessing/generator.py", line 28, in data_generator imgs = data_res[0] TypeError:...