PaddleSeg ValueError: (InvalidArgument) The type of data we are trying to retrieve (float32) does not match the type of data (uint8) currently contained in the container

问题确认 Search before asking

[X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

I am trying to train my custom dataset with FCN HRNet_W18 with the following configuration:

batch_size: 16
iters: 160000

train_dataset:
  type: Dataset
  dataset_root: data/custom
  train_path: data/custom/train_list.txt
  num_classes: 6
  mode: train
  transforms:
    - type: RandomHorizontalFlip
    - type: ResizeStepScaling
      min_scale_factor: 0.75
      max_scale_factor: 1.25
      scale_step_size: 0.25
    - type: Normalize
    - type: RandomCenterCrop
      retain_ratio: [0.8, 0.9]
    - type: RandomNoise
      prob: 0.3
      max_sigma: 3
    - type: RandomBlur
      prob: 0.1
      blur_type: gaussian
    - type: RandomRotation
      max_rotation: 10
    - type: RandomScaleAspect
    - type: RandomDistort
      brightness_range: 0.3
      contrast_range: 0.3
      saturation_range: 0.3
      hue_range: 10
      sharpness_range: 0.2
    - type: RandomAffine
      size: [512,512]
      max_rotation: 10
    #- type: GenerateInstanceTargets


val_dataset:
  type: Dataset
  dataset_root: data/custom
  val_path: data/custom/test_list.txt
  num_classes: 6
  mode: val
  transforms:
    - type: Normalize

optimizer:
  type: SGD
  momentum: 0.9
  weight_decay: 4.0e-5

lr_scheduler:
  type: PolynomialDecay
  learning_rate: 0.01
  end_lr: 0
  power: 0.9

loss:
  types:
    - type: MixedLoss
      losses:
        - type: CrossEntropyLoss
          weight: [4.079461, 1.00718, 1.0, 3.01162, 1.73896, 3.00383] 
        - type: LovaszSoftmaxLoss
      coef: [0.5, 0.5]
  coef: [1]

model:
  type: FCN
  backbone:
    type: HRNet_W18
    align_corners: False
    pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz
  num_classes: 6
  pretrained: Null
  backbone_indices: [-1]

I modified the configuration file to match my desired settings (loss and heavy data augmentation).

Then, with the command python tools/train.py --config configs/quick_start/pp_fcn_hrnet_w18_custom_640x640_160k.yml --save_interval 300 --do_eval --use_vdl --save_dir custom_fcn_hrnet_w18

Traceback (most recent call last):
  File "/project/PaddleSeg/tools/train.py", line 195, in <module>
    main(args)
  File "/project/PaddleSeg/tools/train.py", line 170, in main
    train(
  File "/project/PaddleSeg/paddleseg/core/train.py", line 227, in train
    logits_list = ddp_model(images) if nranks > 1 else model(images)
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/project/PaddleSeg/paddleseg/models/fcn.py", line 76, in forward
    feat_list = self.backbone(x)
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/project/PaddleSeg/paddleseg/models/backbones/hrnet.py", line 173, in forward
    conv1 = self.conv_layer1_1(x)
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/project/PaddleSeg/paddleseg/models/layers/layer_libs.py", line 55, in forward
    x = self._conv(x)
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/layer/conv.py", line 710, in forward
    out = F.conv._conv_nd(
  File "/home/user/.conda/envs/paddle_env/lib/python3.9/site-packages/paddle/nn/functional/conv.py", line 133, in _conv_nd
    pre_bias = _C_ops.conv2d(
ValueError: (InvalidArgument) The type of data we are trying to retrieve (float32) does not match the type of data (uint8) currently contained in the container.
  [Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():2 != phi::CppTypeToDataType<T>::Type():10.] (at ../paddle/phi/core/dense_tensor.cc:166)

I found some similar issues posted online but they weren't addressed fully. One said to run it on Linux, which is already done. Another feedback was that it could be due to applying the MixedLoss. It might be true, beacause I modified the loss part of cfg file.

------------Environment Information-------------
platform: Linux-3.10.0-1127.10.1.el7.x86_64-x86_64-with-glibc2.31
Python: 3.9.17 (main, Jul  5 2023, 20:41:20) [GCC 11.2.0]
Paddle compiled with cuda: True
NVCC: Build cuda_11.2.r11.2/compiler.29373293_0
cudnn: 8.1
GPUs used: 1
CUDA_VISIBLE_DEVICES: None
GPU: ['GPU 0: Tesla V100-SXM2-32GB', 'GPU 1: Tesla V100-SXM2-32GB', 'GPU 2: Tesla V100-SXM2-32GB', 'GPU 3: Tesla V100-SXM2-32GB']
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PaddleSeg: 2.8.0
PaddlePaddle: 2.5.1
OpenCV: 4.5.5

Aug 25 '23 01:08 bit-scientist

Have you ever made changes to the training code? It appears that there is an error in the model's forward stage due to an incorrect data type for the input.

Aug 26 '23 12:08 Asthestarsfalll

No, I haven't changed it. The only thing I changed is the cfg file.

Aug 26 '23 13:08 bit-scientist

@bit-scientist How did you install paddleseg? if you installed it by pip, please try to install through source code

Aug 29 '23 09:08 Asthestarsfalll

@Asthestarsfalll, I git-cloned PaddleSeg and installed reqs first, then did pip install -v -e . I might just have ran pip install paddleseg afterwards, but can't recall it now. Do you want me to clean-install everything?

Aug 29 '23 23:08 bit-scientist

Have you solved this problem? I am facing the same issue and our environment settings are identical.

Jul 22 '24 17:07 qijix

Thanks for this issue. As it has been inactive for a long time, we would close it. If you has any questions, please feel free to reopen or new issue, and we will follow up and resolve it.

Nov 13 '24 07:11 TingquanGao