nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

a problem in training

Open RezaRazmara opened this issue 3 years ago • 6 comments

I started to create the dataset and The pixels of labels had values 0.0 for background, 63.0 for liver, 126.0 for spleen,189.0 left kidney, 252.0 right kidney, So i turn them to 0,1,2,3,4 scalars by code. So the Preprocess phase done correctly without any issue. But in training phase, I get this error!:

File "/usr/local/bin/nnUNet_train", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/nnunet/run/run_training.py", line 179, in main trainer.run_training() File "/usr/local/lib/python3.7/dist-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training ret = super().run_training() File "/usr/local/lib/python3.7/dist-packages/nnunet/training/network_training/nnUNetTrainer.py", line 317, in run_training super(nnUNetTrainer, self).run_training() File "/usr/local/lib/python3.7/dist-packages/nnunet/training/network_training/network_trainer.py", line 456, in run_training l = self.run_iteration(self.tr_gen, True) File "/usr/local/lib/python3.7/dist-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 249, in run_iteration l = self.loss(output, target) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/nnunet/training/loss_functions/deep_supervision.py", line 39, in forward l = weights[0] * self.loss(x[0], y[0]) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/nnunet/training/loss_functions/dice_loss.py", line 363, in forward dc_loss = self.dc(net_output, target, loss_mask=mask) if self.weight_dice != 0 else 0 File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/nnunet/training/loss_functions/dice_loss.py", line 195, in forward tp, fp, fn, _ = get_tp_fp_fn_tn(x, y, axes, loss_mask, False) File "/usr/local/lib/python3.7/dist-packages/nnunet/training/loss_functions/dice_loss.py", line 145, in get_tp_fp_fn_tn y_onehot.scatter(1, gt, 1) RuntimeError: index 63 is out of bounds for dimension 1 with size 5

It confused me!

RezaRazmara avatar Mar 08 '22 16:03 RezaRazmara

It seems like you still have values in your segmentation that are not supposed to be there. Please run nnUNet_plan_and_preprocess with the --verify_dataset_integrity flag

FabianIsensee avatar Mar 09 '22 06:03 FabianIsensee

Thank you Fabian for reply. I did that before and It seems Ok. Verifying training set checking case IMG_001 checking case IMG_002 checking case IMG_003 checking case IMG_005 checking case IMG_008 checking case IMG_010 checking case IMG_013 checking case IMG_015 checking case IMG_019 checking case IMG_020 checking case IMG_021 checking case IMG_022 checking case IMG_031 checking case IMG_032 checking case IMG_033 checking case IMG_034 checking case IMG_036 checking case IMG_037 checking case IMG_038 checking case IMG_039 Verifying label values Expected label values are [0, 1, 2, 3, 4] Labels OK Dataset OK IMG_001 IMG_002 IMG_003 IMG_005 IMG_008 IMG_010 IMG_013 IMG_015 IMG_019 IMG_020 IMG_021 IMG_022 IMG_031 IMG_032 IMG_033 IMG_034 IMG_036 IMG_037 IMG_038 IMG_039

RezaRazmara avatar Mar 09 '22 07:03 RezaRazmara

hm in that case it's really hard to say. Would you be able to share your dataset?>

FabianIsensee avatar Mar 09 '22 08:03 FabianIsensee

The problem solved dear Fabian. I deleted nnUNet_cropped_data folder and rerun the preprocessing stage. but now the new problem is the training stuck in epoch 0.

2022-03-09 10:29:04.543689: epoch: 0 /usr/local/lib/python3.7/dist-packages/torch/autocast_mode.py:141: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn('User provided device_type of 'cuda', but CUDA is not available. Disabling')

RezaRazmara avatar Mar 09 '22 10:03 RezaRazmara

The dataset is here: https://github.com/RezaRazmara/segmentation_dataset thanks for your help

RezaRazmara avatar Mar 09 '22 10:03 RezaRazmara

Hi, your GPU ist not set up correctly. I cannot help you with that - please google the issue :-) There are plenty of threads about this already

FabianIsensee avatar Mar 09 '22 11:03 FabianIsensee

Hey @RezaRazmara

Did you manage to get the training running on your GPU or do you have any follow-up questions? Otherwise I would close this issue.

mrokuss avatar Jul 31 '23 13:07 mrokuss