Human-Segmentation-PyTorch icon indicating copy to clipboard operation
Human-Segmentation-PyTorch copied to clipboard

Training from scratch fails

Open b-hakim opened this issue 5 years ago • 3 comments

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

b-hakim avatar Sep 18 '20 18:09 b-hakim

I changed that line to

targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.ones_like(logits))

in order to make the index and src in same size. It seems work for me.

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

FrankChengGD avatar Nov 23 '20 06:11 FrankChengGD

I am having this issue when running training from scratch

Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(config, args.resume)
  File "train.py", line 55, in main
    trainer.train()
  File "/path/Human-Segmentation-PyTorch/base/base_trainer.py", line 95, in train
    result = self._train_epoch(epoch)
  File "/path/Human-Segmentation-PyTorch/trainer/trainer.py", line 81, in _train_epoch
    loss = self.loss(output, target)
  File "/path/Human-Segmentation-PyTorch/evaluation/losses.py", line 18, in dice_loss
    targets = torch.zeros_like(logits).scatter_(dim=1, index=targets.type(torch.int64), src=torch.tensor(1.0))
RuntimeError: Index tensor must have the same number of dimensions as src tensor

I guess I might have an issue in the dataset labeling. What is the correct format? I used an image of the same size for the original image and have 1 channel (I have 2 classes), so that is a mask of 1 channel containing 0 and 1 (or 0 and 255)

I have the same problem as you. Try to use command 'pip install torch==1.2.0' and 'torchvision==0.4.0' to solve it.

HettieLi avatar Sep 27 '22 09:09 HettieLi

type(torch.int64)

你好,我尝试更改为你的代码,但报错 inter = (outputs & targets).type(torch.float32).sum(dim=(2,3)) RuntimeError: "bitwise_and_cuda" not implemented for 'Float'

XXMxxm220 avatar Aug 01 '23 03:08 XXMxxm220