PaddleSeg Why the loss value is negative during the DecoupledSegNet training process? Currently built with RTX3090 cuda11.3 using the cuda11.2 docker image provided by your company. Looking forward to your reply

Why the loss value is negative during the DecoupledSegNet training process? Currently built with RTX3090 cuda11.3 using the cuda11.2 docker image provided by your company. Looking forward to your reply

Open Murphy-MML opened this issue 3 years ago • 4 comments

λ/PaddleSeg python train.py --config configs/decoupled_segnet/decoupledsegnet_resnet50_os8_cityscapes_1024x512_80k.yml --save_dir /PaddleSegData/workdir/decoupledsegnet/decoupledsegnet_resnet50_os8_cityscapes_1024x512_80k --do_eval grep: warning: GREP_OPTIONS is deprecated; please use an alias or script 2022-04-28 08:09:43 [INFO] ------------Environment Information------------- platform: Linux-5.4.0-90-generic-x86_64-with-Ubuntu-18.04-bionic Python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0] Paddle compiled with cuda: True NVCC: Build cuda_11.2.r11.2/compiler.29618528_0 cudnn: 8.1 GPUs used: 1 CUDA_VISIBLE_DEVICES: None GPU: ['GPU 0: NVIDIA GeForce'] GCC: gcc (GCC) 8.2.0 PaddleSeg: 2.5.0 PaddlePaddle: 2.2.2 OpenCV: 4.5.5

2022-04-28 08:09:44 [INFO] ---------------Config Information--------------- batch_size: 4 iters: 80000 loss: coef:

1
1
25
1 types:
ignore_index: 255 type: OhemCrossEntropyLoss
ignore_index: 255 type: RelaxBoundaryLoss
edge_label: true ignore_index: 255 type: BCELoss weight: dynamic
ignore_index: 255 type: OhemEdgeAttentionLoss lr_scheduler: end_lr: 0.0 learning_rate: 0.01 power: 0.9 type: PolynomialDecay model: align_corners: false aspp_out_channels: 256 aspp_ratios:
1
12
24
36 backbone: multi_grid:
- 1
- 2
- 4 output_stride: 8 pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz type: ResNet50_vd backbone_indices:
0
3 num_classes: 19 pretrained: null type: DecoupledSegNet optimizer: momentum: 0.9 type: sgd weight_decay: 4.0e-05 train_dataset: dataset_root: data/cityscapes edge: true mode: train transforms:
max_scale_factor: 2.0 min_scale_factor: 0.5 scale_step_size: 0.25 type: ResizeStepScaling
crop_size:
- 1024
- 512 type: RandomPaddingCrop
type: RandomHorizontalFlip
brightness_range: 0.4 contrast_range: 0.4 saturation_range: 0.4 type: RandomDistort
type: Normalize type: Cityscapes val_dataset: dataset_root: data/cityscapes mode: val transforms:
type: Normalize type: Cityscapes

W0428 08:09:44.089140 9630 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.5, Runtime API Version: 11.2 W0428 08:09:44.089159 9630 device_context.cc:465] device: 0, cuDNN Version: 8.1. 2022-04-28 08:09:45 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz 2022-04-28 08:09:45 [INFO] There are 275/275 variables loaded into ResNet_vd. /usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/math_op_patch.py:253: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int32, the right dtype will convert to paddle.float32 format(lhs_dtype, rhs_dtype, lhs_dtype)) /usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/math_op_patch.py:253: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32 format(lhs_dtype, rhs_dtype, lhs_dtype)) /usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/math_op_patch.py:253: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.float32, the right dtype will convert to paddle.int64 format(lhs_dtype, rhs_dtype, lhs_dtype)) 2022-04-28 08:10:17 [INFO] [TRAIN] epoch: 1, iter: 10/80000, loss: -1091.5861, lr: 0.009999, batch_cost: 3.1310, reader_cost: 2.40938, ips: 1.2775 samples/sec | ETA 69:34:11 2022-04-28 08:10:48 [INFO] [TRAIN] epoch: 1, iter: 20/80000, loss: -1140.5043, lr: 0.009998, batch_cost: 3.1686, reader_cost: 2.49397, ips: 1.2624 samples/sec | ETA 70:23:44 2022-04-28 08:11:21 [INFO] [TRAIN] epoch: 1, iter: 30/80000, loss: -1136.8536, lr: 0.009997, batch_cost: 3.2691, reader_cost: 2.57578, ips: 1.2236 samples/sec | ETA 72:37:12

Apr 28 '22 08:04 Murphy-MML

当我对BCELoss函数进行更改后，更改 if label.shape[1] == logit.shape[1]:Label执行0或1的输入，但是训练时loss一直震荡不收敛，mIOU值也很低

Apr 29 '22 06:04 Murphy-MML

我这边通过只更改这些，发现报错 ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape (4,2) 请问方便把你的binary_cross_entropy_loss.py文件复制一下粘在回复吗，万分感谢

BCELoss 实现有问题，forward 里面没管edge_label 等于True if self.edge_label : edge_mask = F.mask_to_binary_edge( label, radius=2, num_classes=self.num_classes)

loss = paddle.nn.functional.binary_cross_entropy_with_logits( logit, edge_mask , weight=weight, reduction='none', pos_weight=pos_weight)

May 20 '22 01:05 Murphy-MML

PaddleSeg PaddleSeg copied to clipboard

Why the loss value is negative during the DecoupledSegNet training process? Currently built with RTX3090 cuda11.3 using the cuda11.2 docker image provided by your company. Looking forward to your reply

PaddleSeg
PaddleSeg copied to clipboard