PaddleSeg
PaddleSeg copied to clipboard
[Hint: 'CUBLAS_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.
paddlepaddle-gpu 2.3.0.post110 run errror
File "/data/PaddleSeg-2.5/paddleseg/core/train.py", line 204, in train
logits_list = ddp_model(images) if nranks > 1 else model(images)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/PaddleSeg-2.5/paddleseg/models/segmenter.py", line 55, in forward
feats, shape = self.backbone(x)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/PaddleSeg-2.5/paddleseg/models/backbones/vision_transformer.py", line 276, in forward
x = blk(x)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/PaddleSeg-2.5/paddleseg/models/backbones/vision_transformer.py", line 119, in forward
x = x + self.drop_path(self.attn(self.norm1(x)))
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/PaddleSeg-2.5/paddleseg/models/backbones/vision_transformer.py", line 73, in forward
qkv = self.qkv(x).reshape((-1, N, 3, self.num_heads, C //
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/nn/layer/common.py", line 172, in forward
x=input, weight=self.weight, bias=self.bias, name=self.name)
File "/data/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/nn/functional/common.py", line 1542, in linear
False)
OSError: (External) CUBLAS error(13).
[Hint: 'CUBLAS_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons. To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed. ] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35)
[operator < matmul_v2 > error]
NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0
This could be caused by out-of-memory in GPU or GPU launch issue. Did you check your device with "nvidia-smi"? What is the script you used to run this program? And did you change any of our code?
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:3B:00.0 Off | N/A | | 30% 25C P8 19W / 350W | 5MiB / 24576MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:B1:00.0 Off | N/A | | 51% 62C P2 182W / 350W | 22705MiB / 24576MiB | 50% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2635 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 2635 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 29918 C python 22697MiB
python train.py --config custom_config/segmenter_vit_base_linear_xxx_512x512_160k.yml --do_eval --use_vdl --save_interval 500 --save_dir segmenter_vit_base_linear_xxx_20220610_512x512_160k
i got same error when running for segformer model.
nvidia-smi
| NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 36% 62C P2 105W / 350W | 541MiB / 12288MiB | 0% Default | | | | N/A | | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | 0 N/A N/A 1319 G /usr/lib/xorg/Xorg 35MiB | | 0 N/A N/A 2121 G /usr/lib/xorg/Xorg 102MiB | | 0 N/A N/A 2250 G /usr/bin/gnome-shell 27MiB | | 0 N/A N/A 5491 C python3 359MiB |
from scipy.ndimage.interpolation import shift
/home/boe-malenia-23/anaconda3/PaddleSeg/paddleseg/transforms/functional.py:18: DeprecationWarning: Please use distance_transform_edt from the scipy.ndimage namespace, the scipy.ndimage.morphology namespace is deprecated.
from scipy.ndimage.morphology import distance_transform_edt
2022-06-27 17:50:21 [INFO]
------------Environment Information-------------
platform: Linux-5.13.0-51-generic-x86_64-with-glibc2.17
Python: 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0]
Paddle compiled with cuda: True
NVCC: Build cuda_11.6.r11.6/compiler.31057947_0
cudnn: 8.4
GPUs used: 1
CUDA_VISIBLE_DEVICES: None
GPU: ['GPU 0: NVIDIA GeForce']
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PaddleSeg: 2.5.0
PaddlePaddle: 2.3.0
OpenCV: 4.5.3-openvino
2022-06-27 17:50:21 [INFO] ---------------Config Information--------------- batch_size: 1 distill_loss: coef:
- 3 types:
- type: KLLoss iters: 1000 loss: coef:
- 1 types:
- ignore_index: 255 type: CrossEntropyLoss lr_scheduler: learning_rate: 6.0e-05 power: 1 type: PolynomialDecay model: num_classes: 19 pretrained: https://bj.bcebos.com/paddleseg/dygraph/mix_vision_transformer_b2.tar.gz type: SegFormer_B2 optimizer: beta1: 0.9 beta2: 0.999 type: AdamW weight_decay: 0.01 train_dataset: dataset_root: data/cityscape mode: train transforms:
- target_size:
- 1024
- 1024 type: Resize
- type: RandomHorizontalFlip
- type: Normalize type: Cityscapes val_dataset: dataset_root: data/cityscape mode: val transforms:
- target_size:
- 1024
- 1024 type: Resize
- type: Normalize type: Cityscapes
W0627 17:50:21.221050 27212 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2
W0627 17:50:21.221062 27212 gpu_context.cc:306] device: 0, cuDNN Version: 8.4.
2022-06-27 17:50:22 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/mix_vision_transformer_b2.tar.gz
2022-06-27 17:50:22 [WARNING] linear_c4.proj.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c4.proj.bias is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c3.proj.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c3.proj.bias is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c2.proj.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c2.proj.bias is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c1.proj.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_c1.proj.bias is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_fuse._conv.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_fuse._batch_norm.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_fuse._batch_norm.bias is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_fuse._batch_norm._mean is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_fuse._batch_norm._variance is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_pred.weight is not in pretrained model
2022-06-27 17:50:22 [WARNING] linear_pred.bias is not in pretrained model
2022-06-27 17:50:22 [INFO] There are 332/347 variables loaded into SegFormer.
Traceback (most recent call last):
File "train.py", line 230, in
command:
python3 train.py
--config configs/segformer/segformer_b1_cityscapes_1024x1024_160k.yml
--do_eval
--use_vdl
--save_interval 500
--save_dir segformerB2
I get the same error when runing paddle ocr
To correct: check that the hardware, an appropriate driver version, and the cuBLAS library are correctly installed. This could be related to the environment, like the cublas library and the driver. Try to create a new environment with conda and install paddle on cuda 10.2 and cudnn 7.6.5 with a compatible version of driver.