PaddleSeg
PaddleSeg copied to clipboard
[Bug]OSError: (External) Cuda error(719), unspecified launch failure.
Thanks for your bug report. To help us solve the issue better, please provide following information:
- PaddleSeg version: PaddleSeg release/2.5.0
- PaddlePaddle version: (e.g. PaddlePaddle 2.1.2)
- Operation system: BML CodeLab
- Python version: 3.7.4
- CUDA/cuDNN version: CUDA10.1/cuDNN 7.6
- Full codes:
安装paddleseg
! pip install -q paddleseg
解压数据集
! mkdir /home/aistudio/DataSet ! unzip -q /home/aistudio/data/data167519/22.zip -d DataSet
! pip install paddlex==2.0.0 import paddlex as pdx !paddlex --split_dataset --format Seg --dataset_dir 'DataSet' --val_value 0.2 --test_value 0.1 from paddlex import transforms as T
train_transforms = T.Compose([ T.Resize(target_size=256), T.RandomHorizontalFlip(), T.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ])
eval_transforms = T.Compose([ T.Resize(target_size=256), T.Normalize( mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]) train_dataset = pdx.datasets.SegDataset( data_dir='DataSet', file_list='DataSet/train_list.txt', label_list='DataSet/labels.txt', transforms=train_transforms, shuffle=True ) eval_dataset = pdx.datasets.SegDataset( data_dir='DataSet', file_list='DataSet/val_list.txt', label_list='DataSet/labels.txt', transforms=eval_transforms) num_classes = len(train_dataset.labels) print(num_classes) model = pdx.seg.BiSeNetV2(num_classes=num_classes)
model.train( num_epochs=10, train_dataset=train_dataset, train_batch_size=4, eval_dataset=eval_dataset, learning_rate=0.05, # binarize_labels=True save_dir='output/bisenet' # use_vdl=True )
- Detailed error information, releated running log: 2 W0912 17:01:14.498520 164 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1 W0912 17:01:14.592695 164 device_context.cc:422] device: 0, cuDNN Version: 7.6. 2022-09-12 17:01:17 [INFO] Loading pretrained model from output/bisenet/pretrain/model.pdparams 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head1.conv_1x1.weight doesn't match.(Pretrained: [19, 16, 1, 1], Actual: [2, 16, 1, 1]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head1.conv_1x1.bias doesn't match.(Pretrained: [19], Actual: [2]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head2.conv_1x1.weight doesn't match.(Pretrained: [19, 32, 1, 1], Actual: [2, 32, 1, 1]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head2.conv_1x1.bias doesn't match.(Pretrained: [19], Actual: [2]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head3.conv_1x1.weight doesn't match.(Pretrained: [19, 64, 1, 1], Actual: [2, 64, 1, 1]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head3.conv_1x1.bias doesn't match.(Pretrained: [19], Actual: [2]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head4.conv_1x1.weight doesn't match.(Pretrained: [19, 128, 1, 1], Actual: [2, 128, 1, 1]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params aux_head4.conv_1x1.bias doesn't match.(Pretrained: [19], Actual: [2]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params head.conv_1x1.weight doesn't match.(Pretrained: [19, 128, 1, 1], Actual: [2, 128, 1, 1]) 2022-09-12 17:01:17 [WARNING] [SKIP] Shape of pretrained params head.conv_1x1.bias doesn't match.(Pretrained: [19], Actual: [2]) 2022-09-12 17:01:17 [INFO] There are 346/356 variables loaded into BiSeNetV2.
OSError Traceback (most recent call last)
/tmp/ipykernel_164/2351295887.py in
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/segmenter.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, optimizer, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, learning_rate, lr_decay_power, early_stop, early_stop_patience, use_vdl, resume_checkpoint) 300 early_stop=early_stop, 301 early_stop_patience=early_stop_patience, --> 302 use_vdl=use_vdl) 303 304 def quant_aware_train(self,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, ema, early_stop, early_stop_patience, use_vdl) 341 self.optimizer._learning_rate.step() 342 --> 343 train_avg_metrics.update(outputs) 344 outputs['lr'] = lr 345 if ema is not None:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/utils/stats.py in update(self, stats) 49 } 50 for k, v in self.meters.items(): ---> 51 v.update(stats[k].numpy()) 52 53 def get(self, extras=None):
OSError: (External) Cuda error(719), unspecified launch failure. [Advise: Please search for the error code(719) on website( https://docs.nvidia.com/cuda/archive/10.0/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038 ) to get Nvidia's official solution about CUDA Error.] (at /paddle/paddle/fluid/platform/gpu_info.cc:394) 10. Running command or reproduce details: 11. Additional context: 使用的GPU是2核,总显存16,AI加速卡v100。
欢迎您反馈PaddleSeg使用问题,辛苦您提供以下信息,方便我们快速定位和解决问题:
- PaddleSeg版本:(请提供版本号和分支信息,如PaddleSeg release/2.3)
- PaddlePaddle版本:(如PaddlePaddle 2.1.0)
- 操作系统信息:(如Linux/Windows/MacOS)
- Python版本号:(如Python3.6/7/8)
- CUDA/cuDNN版本:( 如CUDA10.2/cuDNN 7.6.5等)
- 完整的代码:(若修改过原代码,请提供修改前后代码对比)
- 详细的错误信息、相关log:(若使用多卡,log默认保存在log/worklog.0)
- 运行指令或复现步骤:
- 其他内容: (增加其他与问题相关的内容)
你这安装的是paddlex进行训练、报错,建议直接使用paddleseg进行训练。