nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

Open 514tzy opened this issue 10 months ago • 5 comments

I have used msd data set, and there are no problems in data set conversion and preprocessing. I don't know why it will be like this in training, can someone answer it, thank you

(bratsnnunet) PS E:\nnUNet-master> nnUNetv2_train 4 2d 0

############################ INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md ############################

Using device: cuda:0

####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

2024-04-21 15:19:52.712785: do_dummy_2d_data_aug: False 2024-04-21 15:19:52.713802: Creating new 5-fold cross-validation split... 2024-04-21 15:19:52.716793: Desired fold for training: 0 2024-04-21 15:19:52.716793: This split has 208 training and 52 validation cases. using pin_memory on device 0 using pin_memory on device 0 2024-04-21 15:20:11.474595: Using torch.compile... E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch\optim\lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "

This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 366, 'patch_size': [56, 40], 'median_image_size_in_voxels': [50.0, 35.0], 'spacing': [1.0, 1.0], 'n ormalization_schemes': ['ZScoreNormalization'], 'use_mask_for_norm': [False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampli ng_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resamp ling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_clas s_name': 'dynamic_network_architectures.architectures.unet.PlainConvUNet', 'arch_kwargs': {'n_stages': 4, 'features_per_stage': [32, 64, 128, 256], 'conv_op': 'torch.nn.modules.conv.Conv2d', 'kern el_sizes': [[3, 3], [3, 3], [3, 3], [3, 3]], 'strides': [[1, 1], [2, 2], [2, 2], [2, 2]], 'n_conv_per_stage': [2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2], 'conv_bias': True, 'norm_op': 'to rch.nn.modules.instancenorm.InstanceNorm2d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': {'inplace': True}, 'deep_supervision': True}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': True}

These are the global plan.json settings: {'dataset_name': 'Dataset004_Hippocampus', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 1.0, 1.0], 'original_median_shape_after_transp': [36, 50, 35], 'image_reader_ writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_prop erties_per_channel': {'0': {'max': 486420.21875, 'mean': 22360.326171875, 'median': 362.88250732421875, 'min': 0.0, 'percentile_00_5': 28.0, 'percentile_99_5': 277682.03125, 'std': 60656.1328125}}}

2024-04-21 15:20:13.027511: unpacking dataset... 2024-04-21 15:20:13.490002: unpacking done... 2024-04-21 15:20:13.494466: Unable to plot network architecture: nnUNet_compile is enabled! 2024-04-21 15:20:13.520129: 2024-04-21 15:20:13.520129: Epoch 0 2024-04-21 15:20:13.520129: Current learning rate: 0.01 Traceback (most recent call last): File "E:\anaconda3\envs\bratsnnunet\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "E:\anaconda3\envs\bratsnnunet\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "E:\anaconda3\envs\bratsnnunet\Scripts\nnUNetv2_train.exe_main.py", line 7, in File "E:\nnUNet-master\nnunetv2\run\run_training.py", line 274, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "E:\nnUNet-master\nnunetv2\run\run_training.py", line 210, in run_training nnunet_trainer.run_training() File "E:\nnUNet-master\nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py", line 1295, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "E:\nnUNet-master\nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py", line 922, in train_step output = self.network(data) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch_dynamo\eval_frame.py", line 489, in _fn return fn(*args, **kwargs) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl Traceback (most recent call last): File "E:\anaconda3\envs\bratsnnunet\lib\threading.py", line 980, in _bootstrap_inner self.run() File "E:\anaconda3\envs\bratsnnunet\lib\threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "E:\anaconda3\envs\bratsnnunet\lib\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

514tzy avatar Apr 21 '24 07:04 514tzy

I'm having the same problem, probably due to the recent update, do you have any solution?

york-yan avatar Apr 21 '24 12:04 york-yan

I'm having the same problem, probably due to the recent update, do you have any solution?

I feel the same way, the early April version is still working

514tzy avatar Apr 21 '24 12:04 514tzy

which version is ok? i have the same problem and didn't fix it

xujiangyu avatar May 01 '24 14:05 xujiangyu

The error message seems incomplete. Is this the entire output of the model? I cannot see an actual error that would point to the problem. Can you try setting the environment variable nnUNet_compile=f and try again?

FabianIsensee avatar May 08 '24 09:05 FabianIsensee

Any update?

FabianIsensee avatar Jul 25 '24 11:07 FabianIsensee