DSB2017
DSB2017 copied to clipboard
关于detector的test
@lfz 你好,我在训练网络的时候,detector模型的训练部分可以实现,但是在执行test的时候出现错误,提示时out of memory, 但我训练的时候是可以通过的啊。我也尝试将test时的batch_size减小,但还是不行。 test函数里面 for i in range(len(splitlist)-1): input = Variable(data[splitlist[i]:splitlist[i+1]], volatile = True).cuda() inputcoord = Variable(coord[splitlist[i]:splitlist[i+1]], volatile = True).cuda() if isfeat: output,feature = net(input,inputcoord) featurelist.append(feature.data.cpu().numpy()) else: output = net(input,inputcoord) outputlist.append(output.data.cpu().numpy())
i=0时可以执行,当i=1时报错
错误信息:
Traceback (most recent call last):
File "main.py", line 353, in
我用的是两块GTX1080Ti,下面时bash文件 cd detector eps=100 CUDA_VISIBLE_DEVICES=0,1 python main.py --model res18 -b 4 --epochs $eps --save-dir res18 CUDA_VISIBLE_DEVICES=0,1 python main.py --model res18 -b 2 --resume results/res18/$eps.ckpt --test 1 cp results/res18/$eps.ckpt ../../model/detector2.ckpt
我看训练时图像尺寸为128128128,测试时的图像尺寸为208208208,是不是和这个有关系啊?
I d suggest you to temporarily remove dataparallel and set batch size to 1 so that you can manually debug it. The size 208 definitely fit for a 1080Ti card.
我在main.py的第96行后面添加了一句del output 问题解决了。不过在运行到第65个文件的时候,又出现了错误。
Traceback (most recent call last):
File "main.py", line 349, in
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x00000196E28DC3C8>> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 349, in del self._shutdown_workers() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 328, in _shutdown_workers self.worker_result_queue.get() File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 337, in get return _ForkingPickler.loads(res) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py", line 86, in rebuild_storage_filename storage = cls._new_shared_filename(manager, handle, size) RuntimeError: Couldn't open shared event: <torch_4404_3118327515_event>, error code: <2> at ..\src\TH\THAllocator.c:218
我的也是这个问题,在执行test的时候出现错误,提示时out of memory, 但我训练的时候是可以通过的啊。我也尝试将test时的batch_size减小,但还是不行。不知道您解决了没有?希望可以指教一下,万分感谢@Carl-Lei
I met the same problem too.
you can add 'with torch.no_grad():' before 'input = Variable(data[splitlist[i]:splitlist[i+1]], volatile = True).cuda() inputcoord = Variable(coord[splitlist[i]:splitlist[i+1]], volatile = True).cuda()'
Under Pytorch 1.x+, you should use the following codes:
def test(data_loader, net, get_pbb, save_dir, config): ... use_cuda = torch.cuda.is_available() device = torch.device("cuda" if use_cuda else "cpu") .. for i_name, (data, target, coord, nzhw) in enumerate(data_loader): ... input = data[splitlist[i] : splitlist[i + 1]].to(device, non_blocking=True) inputcoord = coord[splitlist[i] : splitlist[i + 1]].to(device, non_blocking=True) with torch.no_grad(): if isfeat: output,feature = net(input, inputcoord) featurelist.append(feature.data.cpu().numpy()) else: output = net(input, inputcoord) ....