您好,
在Imagenet.py 读取数据时直接出现了 an illegal memory 的错误,请问是什么原因呢?我的显卡是2 * V100,应该不会出现显存不足的错误呀,源码除了数据集位置没有做任何改变,
以下是错误日志
root@test-6gwz28fvc:/data1/test# python imagenet.py
DALI "gpu" variant
read 1281167 files from 1000 directories
140020509374208 Exception in thread: CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
Traceback (most recent call last):
File "imagenet.py", line 105, in
num_threads=4, crop=224, device_id=0, num_gpus=1)
File "imagenet.py", line 67, in get_imagenet_iter_dali
dali_iter_train = DALIClassificationIterator(pip_train, size=pip_train.epoch_size("Reader") // world_size)
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 338, in init
last_batch_padded = last_batch_padded)
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 148, in init
self._first_batch = self.next()
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 245, in next
return self.next()
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in next
outputs.append(p.share_outputs())
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs
return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: Error in thread 0: CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
Current pipeline object is no longer valid.
terminate called after throwing an instance of 'dali::CUDAError'
what(): CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
已放弃 (核心已转储)
能帮忙看一下吗?谢谢