PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

OSError: (External) CUDA error(3), initialization error.

Open HustleOoo opened this issue 2 years ago • 7 comments

调用paddlex_restfulAPI启动模型训练后报错如下:

This log file path is /home/zksc/paddlex_workspace/projects/P0076/T0332/err.log 注意:标志为WARNING/INFO类的仅为警告或提示类信息,非错误信息 2022-12-01 11:24:43,375-WARNING: type object 'QuantizationTransformPass' has no attribute '_supported_quantizable_op_type' 2022-12-01 11:24:43,375-WARNING: If you want to use training-aware and post-training quantization, please use Paddle >= 1.8.4 or develop version Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly Process Process-3: Traceback (most recent call last): File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex_restful/restful/project/operate.py", line 94, in _call_paddlex_train train(task_path, dataset_path, params['train']) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex_restful/restful/project/train/detection.py", line 224, in train model = detector(num_classes=num_classes, backbone=params.backbone) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex/cv/models/detector.py", line 972, in init backbone = self._get_backbone('DarkNet', norm_type=norm_type) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex/cv/models/detector.py", line 101, in _get_backbone backbone = getattr(ppdet.modeling, backbone_name)(**params) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex/ppdet/modeling/backbones/darknet.py", line 275, in init self.conv0 = ConvBNLayer( File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddlex-2.1.0-py3.9.egg/paddlex/ppdet/modeling/backbones/darknet.py", line 58, in init self.conv = nn.Conv2D( File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/nn/layer/conv.py", line 644, in init super(Conv2D, self).init( File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/nn/layer/conv.py", line 133, in init self.weight = self.create_parameter( File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 423, in create_parameter return self._helper.create_parameter(temp_attr, shape, dtype, is_bias, File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/fluid/layer_helper_base.py", line 376, in create_parameter return self.main_program.global_block().create_parameter( File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/fluid/framework.py", line 3572, in create_parameter initializer(param, self) File "/home/zksc/anaconda3/envs/paddleX/lib/python3.9/site-packages/paddle/fluid/initializer.py", line 365, in call out_var = _C_ops.gaussian_random( OSError: (External) CUDA error(3), initialization error. [Hint: 'cudaErrorInitializationError'. The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:243) [operator < gaussian_random > error]

所使用环境: PaddleX-2.1.0 Linux Python 3.9.13 cuda V11.8.89 cudnn 8.6.0

HustleOoo avatar Dec 01 '22 05:12 HustleOoo

具体是什么样的显卡?cuda 版本请降到11.2试试。

lailuboy avatar Dec 01 '22 07:12 lailuboy

具体是什么样的显卡?cuda 版本请降到11.2试试。

显卡型号 GeForce RTX 3080 Ti

HustleOoo avatar Dec 01 '22 08:12 HustleOoo

具体是什么样的显卡?cuda 版本请降到11.2试试。

降低之后还是报这个错误

HustleOoo avatar Dec 01 '22 09:12 HustleOoo

显卡是3080ti,将cuda降低11.2之后还是报这个错误,前几天还能跑通,是昨天突然跑不通了

在 2022-12-01 15:19:04,"laibaohua" @.***> 写道:

具体是什么样的显卡?cuda 版本请降到11.2试试。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

HustleOoo avatar Dec 01 '22 09:12 HustleOoo

image 已经降到11.2,请问可以交换到微信或者其他联系方式嘛?

xqxq-2020 avatar Dec 01 '22 10:12 xqxq-2020

image 当前还会报出这个错误

xqxq-2020 avatar Dec 02 '22 03:12 xqxq-2020

解决拉没有?

jack00000 avatar Dec 07 '23 06:12 jack00000