PaddleSeg 运行出现bug

运行出现bug

Open wanghonen opened this issue 1 year ago • 6 comments

问题确认 Search before asking

[X] 我已经查询历史issue(包括open与closed)，没有发现相似的bug。I have searched the open and closed issues and found no similar bug report.

Bug描述 Describe the Bug

2024-08-08 06:16:46 [INFO] ------------Environment Information------------- platform: Linux-5.4.0-139-generic-x86_64-with-debian-stretch-sid Python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] Paddle compiled with cuda: True NVCC: Build cuda_11.2.r11.2/compiler.29618528_0 cudnn: 8.2 GPUs used: 1 CUDA_VISIBLE_DEVICES: None GPU: ['GPU 0: Tesla V100-SXM2-32GB'] GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0 PaddleSeg: 2.7.0 PaddlePaddle: 2.3.2 OpenCV: 4.1.1

2024-08-08 06:16:46 [INFO] ---------------Config Information--------------- batch_size: 16 iters: 30000 loss: coef:

1 types:
coef:
- 0.8
- 0.2 losses:
- type: CrossEntropyLoss
- type: LovaszSoftmaxLoss type: MixedLoss lr_scheduler: learning_rate: 6.0e-05 power: 1 type: PolynomialDecay model: align_corners: true backbone: in_channels: 1 pretrained: https://bj.bcebos.com/paddleseg/dygraph/backbone/mix_vision_transformer_b3.tar.gz type: MixVisionTransformer_B3 embedding_dim: 768 num_classes: 7 type: SegFormer optimizer: beta1: 0.9 beta2: 0.999 type: AdamW weight_decay: 0.01 train_dataset: dataset_root: /home/aistudio/data/src/ img_channels: 1 mode: train num_classes: 7 train_path: /home/aistudio/data_split/train.txt transforms:
max_scale_factor: 1.25 min_scale_factor: 0.75 scale_step_size: 0.25 type: ResizeStepScaling
type: RandomVerticalFlip
type: RandomBlur
type: RandomRotation
type: RandomHorizontalFlip
crop_size:
- 512
- 512 type: RandomPaddingCrop
type: Normalize type: Dataset val_dataset: dataset_root: /home/aistudio/data/src/ img_channels: 1 mode: val num_classes: 7 transforms:
type: Normalize type: Dataset val_path: /home/aistudio/data_split/val.txt

W0808 06:16:46.936419 6406 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.2 W0808 06:16:46.936477 6406 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. 2024-08-08 06:16:48 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/backbone/mix_vision_transformer_b3.tar.gz 2024-08-08 06:16:48 [WARNING] [SKIP] Shape of pretrained params patch_embed1.proj.weight doesn't match.(Pretrained: [64, 3, 7, 7], Actual: [64, 1, 7, 7]) 2024-08-08 06:16:48 [INFO] There are 571/572 variables loaded into MixVisionTransformer. 2024-08-08 06:16:48 [INFO] use AMP to train. AMP level = O1 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.float16, the right dtype will convert to paddle.float32 format(lhs_dtype, rhs_dtype, lhs_dtype)) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float16, but right dtype is paddle.float32, the right dtype will convert to paddle.float16 format(lhs_dtype, rhs_dtype, lhs_dtype)) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance. "When training, we now always track global mean and variance.") /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32 format(lhs_dtype, rhs_dtype, lhs_dtype)) Traceback (most recent call last): File "/home/aistudio/PaddleSeg/train.py", line 262, in main(args) File "/home/aistudio/PaddleSeg/train.py", line 254, in main to_static_training=cfg.to_static_training) File "/home/aistudio/PaddleSeg/paddleseg/core/train.py", line 200, in train scaled.backward() # do backward File "", line 2, in backward File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl return wrapped_func(*args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 434, in impl return func(*args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 293, in backward framework._dygraph_tracer()) RuntimeError: (NotFound) There are no kernels which are registered in the pad3d_grad operator. [Hint: Expected kernels_iter != all_op_kernels.end(), but received kernels_iter == all_op_kernels.end().] (at /paddle/paddle/fluid/imperative/prepared_operator.cc:341)

复现环境 Environment

paddlepaddle 2.7 三天之前刚运行过程序，没有任何问题，中间也没有对程序有任何修改，今天运行程序突然出现了上面的错误，不知道是什么问题造成的，目前我已经大概知道问题在哪里了，因为我用了--precision fp16这个，如果我把这个去掉的话，程序能正常运行，但是前几天我也是加的这个啊，也能正常运行，请问是因为你们更新了什么吗

Bug描述确认 Bug description confirmation

[X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息，确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR？ Are you willing to submit a PR?

[X] 我愿意提交PR！I'd like to help by submitting a PR!

Aug 07 '24 22:08 wanghonen

PaddleSeg PaddleSeg copied to clipboard

运行出现bug

问题确认 Search before asking

Bug描述 Describe the Bug

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR？ Are you willing to submit a PR?

PaddleSeg
PaddleSeg copied to clipboard