明明显存还有很多,但他报错OSError: (External) CUDA error(700), an illegal memory access was encountered.
Checklist:
- 查找历史相关issue寻求解答
- 翻阅FAQ常见问题汇总和答疑
- 确认bug是否在新版本里还未修复
- 翻阅PaddleX 使用文档
描述问题
我用paddlex里面目标检测的案例程序 改了一下数据处理,主要是图片尺寸大小 后面训练批次的大小也该小为4了 但出现了 OSError: (External) CUDA error(700), an illegal memory access was encountered. [Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)
复现
- 您是否已经正常运行我们提供的教程? 不能
- 您是否在教程的基础上修改代码内容?还请您提供运行的代码 改了
import paddlex as pdx
from paddlex import transforms as T
# 下载和解压昆虫检测数据集
dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(dataset, path='./')
# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
train_transforms = T.Compose([T.Resize([320,320],interp='CUBIC'),T.RandomVerticalFlip(),T.RandomCrop(),T.Normalize()])
eval_transforms = T.Compose([T.Resize([320,320],interp='CUBIC'),T.RandomHorizontalFlip(),T.CenterCrop(),T.Normalize()])
# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms,
shuffle=False)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/visualdl.md
num_classes = len(train_dataset.labels)
model = pdx.det.PicoDet(num_classes=num_classes)
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop//docs/apis/models/detection.md
# 各参数介绍与调整说明:https://github.com/PaddlePaddle/PaddleX/blob/develop//docs/parameters.md
model.train(
num_epochs=270,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.001 / 8,
warmup_steps=1000,
warmup_start_lr=0.0,
save_interval_epochs=5,
lr_decay_epochs=[216, 243],
save_dir='output/yolov3_darknet53')
- 您使用的数据集是? 案例程序自己下的
- 请提供您出现的报错信息及相关log OSError: (External) CUDA error(700), an illegal memory access was encountered. [Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)
环境
- 请提供您使用的PaddlePaddle和PaddleX的版本号 paddlepaddle-gpu2.3.1.post116 paddle2.1.0
- 请提供您使用的操作系统信息,如Linux/Windows/MacOS windows
- 请问您使用的Python版本是? python3.7
- 请问您使用的CUDA/cuDNN的版本号是? CUDA11.6 `cuDNN8.4
我后面发现只有用Picodet模型才会报错
降低cuda版本到11.2试试
python OK.py
2022-08-22 17:16:52,476-WARNING: type object 'QuantizationTransformPass' has no attribute '_supported_quantizable_op_type'
2022-08-22 17:16:52,476-WARNING: If you want to use training-aware and post-training quantization, please use Paddle >= 1.8.4 or develop version
D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppcls\data\preprocess\ops\timm_autoaugment.py:38: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppcls\data\preprocess\ops\timm_autoaugment.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
2022-08-22 17:16:53 [INFO] Starting to read file list from dataset...
2022-08-22 17:16:57 [INFO] 3399 samples in file D:\paddlex_workspace08\datasets\D0039\train_list.txt, including 3399 positive samples and 0 negative samples.
creating index...
index created!
2022-08-22 17:16:57 [INFO] Starting to read file list from dataset...
2022-08-22 17:16:58 [INFO] 970 samples in file D:\paddlex_workspace08\datasets\D0039\val_list.txt, including 970 positive samples and 0 negative samples.
creating index...
index created!
2022-08-22 17:17:02 [INFO] 5555 negative samples added. Dataset contains 3399 positive samples and 5555 negative samples.
W0822 17:17:02.240533 9516 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.2
W0822 17:17:02.256193 9516 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-08-22 17:17:02 [INFO] Loading pretrained model from D:\paddlex_workspace08\projects\P0037\T0085\output\pretrain\picodet_lcnet_1_5x_416_coco.pdparams
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls0.weight doesn't match.(Pretrained: [112, 128, 1, 1], Actual: [36, 128, 1, 1])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls0.bias doesn't match.(Pretrained: [112], Actual: [36])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls1.weight doesn't match.(Pretrained: [112, 128, 1, 1], Actual: [36, 128, 1, 1])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls1.bias doesn't match.(Pretrained: [112], Actual: [36])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls2.weight doesn't match.(Pretrained: [112, 128, 1, 1], Actual: [36, 128, 1, 1])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls2.bias doesn't match.(Pretrained: [112], Actual: [36])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls3.weight doesn't match.(Pretrained: [112, 128, 1, 1], Actual: [36, 128, 1, 1])
2022-08-22 17:17:02 [WARNING] [SKIP] Shape of pretrained params head.head_cls3.bias doesn't match.(Pretrained: [112], Actual: [36])
2022-08-22 17:17:02 [INFO] There are 483/491 variables loaded into PicoDet.
Traceback (most recent call last):
File "tranOK.py", line 95, in <module>
resume_checkpoint=None)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\cv\models\detector.py", line 910, in train
resume_checkpoint=resume_checkpoint)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\cv\models\detector.py", line 334, in train
use_vdl=use_vdl)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\cv\models\base.py", line 337, in train_loop
outputs = self.run(self.net, data, mode='train')
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\cv\models\detector.py", line 105, in run
net_out = net(inputs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\architectures\meta_arch.py", line 59, in forward
out = self.get_loss()
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\architectures\picodet.py", line 79, in get_loss
loss_gfl = self.head.get_loss(head_outs, self.inputs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\heads\simota_head.py", line 384, in get_loss
gt_box, gt_label)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\heads\simota_head.py", line 113, in _get_target_single
flatten_bbox, gt_bboxes, gt_labels)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\assigners\simota_assigner.py", line 189, in __call__
gt_bboxes) # [num_points,num_gts]
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddlex\ppdet\modeling\bbox_utils.py", line 227, in batch_bbox_overlaps
eps = paddle.to_tensor([eps])
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
return func(*args, **kwargs)
File "D:\anaconda308\envs\paddleX2022\lib\site-packages\paddle\tensor\creation.py", line 189, in to_tensor
stop_gradient=stop_gradient)
OSError: (External) CUDA error(700), an illegal memory access was encountered.
[Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)
paddlepaddle-gpu 2.3.2.post112
paddlex 2.1.0
cuda 11.2
cuDNN Version: 8.2
python 3.7
训练Picodet模型 我也是出现这个问题
训练其他模型 没问题 只有训练Picodet模型 才会有问题!!!! @lailuboy
兄弟解决没 @andeluleidisi
picodet 一样的问题