tutorials fork vs spawn on MacOS Python 3.9 error

Hi all,

I'm running macos conda-forge on the M1 architecture and testing the mednist_tutorial.ipynb and other jupyter notebooks. I get the following error

  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'MedNISTDataset' on <module '__main__' (built-in)>

This seems to be caused by spawn which apparently became default by Python 3.8 and later. It is fixed by adding the argument multiprocessing_context="fork" to calls of torch.utils.data.DataLoader, however, I suggest that a deeper fixed is made for non-experts.

Steps to reproduce the behavior On MacOS/python-forge 2. jupyter lab mednist_tutorial.ipynb 3. press the clear kernel and run all button

Expected behavior A demo network should be set up and trained

Environment

% python --version
Python 3.9.10
% python -c 'import monai; monai.config.print_debug_info()'

================================
Printing MONAI config...
================================
MONAI version: 0.9.dev2210
Numpy version: 1.22.3
Pytorch version: 1.10.2
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 1a660e6a7a50e985af5ff76b559baab44175438c
MONAI __file__: /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: NOT INSTALLED or UNKNOWN VERSION.
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 9.0.1
Tensorboard version: 2.8.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.63.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: 0.4.1
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
`psutil` required for `print_system_info`

================================
Printing GPU config...
================================
Num GPUs: 0
Has CUDA: False
cuDNN enabled: False

Thank you for a nice package, Jon

Mar 14 '22 11:03 sporring

Hi @sporring ,

Thanks for the investigation and suggestion. @yiheng-wang-nv @wyli I think maybe we can add the argument multiprocessing_context="fork" of torch.utils.data.DataLoader in some example or tutorial to mark the use case? What do you think?

Thanks in advance.

Mar 16 '22 12:03 Nic-Ma

I don't think we'll change the core codebase default value, so I'm converting this to a feature request to the tutorials...

Mar 23 '22 10:03 wyli

I am beginner. I have the same problem. I tried today to do this:

train_ds = MedNISTDataset(train_x, train_y, train_transforms)
train_loader = torch.utils.data.DataLoader(
    train_ds, batch_size=300, shuffle=True, num_workers=10, multiprocessing_context="fork")

Which resulted in error:

ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'

I checked the available methods and got this:

In[2]: import torch.multiprocessing as multiprocessing
multiprocessing.get_all_start_methods()
Out[3]: ['spawn']

What are the steps needed to get this 'fork' working?

Apr 20 '22 13:04 Johnz86

I feel maybe your OS doesn't support fork method? And this initial issue seems like a PyTorch known problem, you may find some solution or workaround: https://github.com/pytorch/pytorch/issues/70344

Thanks.

Apr 20 '22 14:04 Nic-Ma

The linked bug is mac related and I work on windows. python -c 'import monai; monai.config.print_debug_info()'

================================
Printing MONAI config...
================================
MONAI version: 0.8.1
Numpy version: 1.22.3
Pytorch version: 1.9.0+cu111
MONAI flags: HAS_EXT = False, USE_COMPILED = False    
MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1

Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 3.2.2
scikit-image version: 0.19.2
Pillow version: 9.1.0
Tensorboard version: 2.8.0
gdown version: 4.4.0
TorchVision version: 0.10.0+cu111
tqdm version: 4.64.0
lmdb version: 1.3.0
psutil version: 5.9.0
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: 0.3.2
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Windows
Win32 version: ('10', '10.0.18363', 'SP0', 'Multiprocessor Free')
Win32 edition: Enterprise
Platform: Windows-10-10.0.18363-SP0
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
Machine: AMD64
Python version: 3.9.5
Process name: python.exe
Command: ['C:\\Python39\\python.exe', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='C:\\WINDOWS\\System32\\en-US\\KernelBase.dll.mui', fd=-1), popenfile(path='C:\\WINDOWS\\System32\\en-US\\kernel32.dll.mui', fd=-1)]
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [29.7, 8.5, 36.3, 9.9, 17.3, 55.1, 25.7, 8.1, 26.0, 11.3, 20.8, 52.7]
CPU freq. (MHz): 2592
Load avg. in last 1, 5, 15 mins (%): [0.0, 0.0, 0.0]
Disk usage (%): 92.3
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.8
Available memory (GB): 17.6
Used memory (GB): 14.2

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.1
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: Quadro P2000
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 6
GPU 0 Total memory (GB): 4.0
GPU 0 CUDA capability (maj.min): 6.1

Are there any specific configuration options related to windows and monai?

Apr 20 '22 15:04 Johnz86

Windows doesn't support fork semantics natively. We've had issues with Windows before and have advised the solution is to use the local worker only, so train_loader = torch.utils.data.DataLoader(train_ds, batch_size=300, shuffle=True, num_workers=0).

Apr 20 '22 16:04 ericspod

After I set the num_workers=0, I could get a step further with the basic 2d segmentation example. I hit a problem during model training at second step with RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 4.00 GiB total capacity; 2.70 GiB already allocated; 7. 80 MiB free; 2.78 GiB reserved in total by PyTorch). I fixed this with setting: torch.device("cpu"). On first run it took me 1 hour 32 minutes to train the simplest example on my CPU. It would be nice to mention basic requirements for local run and simple Windows/Mac/Linux platform recommendation in the notebook.

Apr 21 '22 12:04 Johnz86

Hi @ericspod @wyli ,

I think maybe we can add some description about the platform in the requirements of README doc: https://github.com/Project-MONAI/tutorials/blob/master/README.md#1-requirements What do you think?

Thanks in advance.

Apr 21 '22 13:04 Nic-Ma

We should add something there and a little warning about Windows behaviour.

Apr 21 '22 13:04 ericspod

Hi @ericspod ,

Would you like to contribute a PR for it? I think you know more details about the platforms.

Thanks.

Apr 21 '22 13:04 Nic-Ma

I'm not really that familiar with the runtime costs of the tutorials, I'm not sure Richard is or not. For the Windows issue I'd just add "Windows users may need to set the num_workers argument of DataLoader to 0 if errors are encountered during training."

Apr 21 '22 14:04 ericspod

OK, maybe @wyli knows more details from CI environment.

Thanks.

Apr 21 '22 14:04 Nic-Ma

yes we currently only have basic unit tests for windows. most of the integration, multi-processing, and file system accessing tests are skipped on windows. we should spend more effort on this if there are enough user interests...

Apr 25 '22 08:04 wyli

tutorials tutorials copied to clipboard

fork vs spawn on MacOS Python 3.9 error

tutorials
tutorials copied to clipboard