tutorials
tutorials copied to clipboard
fork vs spawn on MacOS Python 3.9 error
Hi all,
I'm running macos conda-forge on the M1 architecture and testing the mednist_tutorial.ipynb and other jupyter notebooks. I get the following error
File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'MedNISTDataset' on <module '__main__' (built-in)>
This seems to be caused by spawn which apparently became default by Python 3.8 and later. It is fixed by adding the argument multiprocessing_context="fork"
to calls of torch.utils.data.DataLoader
, however, I suggest that a deeper fixed is made for non-experts.
Steps to reproduce the behavior On MacOS/python-forge 2. jupyter lab mednist_tutorial.ipynb 3. press the clear kernel and run all button
Expected behavior A demo network should be set up and trained
Environment
% python --version
Python 3.9.10
% python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.9.dev2210
Numpy version: 1.22.3
Pytorch version: 1.10.2
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 1a660e6a7a50e985af5ff76b559baab44175438c
MONAI __file__: /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.9/site-packages/monai/__init__.py
Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: NOT INSTALLED or UNKNOWN VERSION.
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 9.0.1
Tensorboard version: 2.8.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.63.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: 0.4.1
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
`psutil` required for `print_system_info`
================================
Printing GPU config...
================================
Num GPUs: 0
Has CUDA: False
cuDNN enabled: False
Thank you for a nice package, Jon
Hi @sporring ,
Thanks for the investigation and suggestion.
@yiheng-wang-nv @wyli I think maybe we can add the argument multiprocessing_context="fork"
of torch.utils.data.DataLoader
in some example or tutorial to mark the use case?
What do you think?
Thanks in advance.
I don't think we'll change the core codebase default value, so I'm converting this to a feature request to the tutorials...
I am beginner. I have the same problem. I tried today to do this:
train_ds = MedNISTDataset(train_x, train_y, train_transforms)
train_loader = torch.utils.data.DataLoader(
train_ds, batch_size=300, shuffle=True, num_workers=10, multiprocessing_context="fork")
Which resulted in error:
ValueError: multiprocessing_context option should specify a valid start method in ['spawn'], but got multiprocessing_context='fork'
I checked the available methods and got this:
In[2]: import torch.multiprocessing as multiprocessing
multiprocessing.get_all_start_methods()
Out[3]: ['spawn']
What are the steps needed to get this 'fork' working?
I feel maybe your OS doesn't support fork
method?
And this initial issue seems like a PyTorch known problem, you may find some solution or workaround:
https://github.com/pytorch/pytorch/issues/70344
Thanks.
The linked bug is mac related and I work on windows.
python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.8.1
Numpy version: 1.22.3
Pytorch version: 1.9.0+cu111
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1
Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 3.2.2
scikit-image version: 0.19.2
Pillow version: 9.1.0
Tensorboard version: 2.8.0
gdown version: 4.4.0
TorchVision version: 0.10.0+cu111
tqdm version: 4.64.0
lmdb version: 1.3.0
psutil version: 5.9.0
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: 0.3.2
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
System: Windows
Win32 version: ('10', '10.0.18363', 'SP0', 'Multiprocessor Free')
Win32 edition: Enterprise
Platform: Windows-10-10.0.18363-SP0
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
Machine: AMD64
Python version: 3.9.5
Process name: python.exe
Command: ['C:\\Python39\\python.exe', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='C:\\WINDOWS\\System32\\en-US\\KernelBase.dll.mui', fd=-1), popenfile(path='C:\\WINDOWS\\System32\\en-US\\kernel32.dll.mui', fd=-1)]
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [29.7, 8.5, 36.3, 9.9, 17.3, 55.1, 25.7, 8.1, 26.0, 11.3, 20.8, 52.7]
CPU freq. (MHz): 2592
Load avg. in last 1, 5, 15 mins (%): [0.0, 0.0, 0.0]
Disk usage (%): 92.3
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.8
Available memory (GB): 17.6
Used memory (GB): 14.2
================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.1
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: Quadro P2000
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 6
GPU 0 Total memory (GB): 4.0
GPU 0 CUDA capability (maj.min): 6.1
Are there any specific configuration options related to windows and monai?
Windows doesn't support fork semantics natively. We've had issues with Windows before and have advised the solution is to use the local worker only, so train_loader = torch.utils.data.DataLoader(train_ds, batch_size=300, shuffle=True, num_workers=0)
.
After I set the num_workers=0, I could get a step further with the basic 2d segmentation example. I hit a problem during model training at second step with RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 4.00 GiB total capacity; 2.70 GiB already allocated; 7. 80 MiB free; 2.78 GiB reserved in total by PyTorch)
.
I fixed this with setting: torch.device("cpu")
. On first run it took me 1 hour 32 minutes to train the simplest example on my CPU.
It would be nice to mention basic requirements for local run and simple Windows/Mac/Linux
platform recommendation in the notebook.
Hi @ericspod @wyli ,
I think maybe we can add some description about the platform in the requirements of README doc: https://github.com/Project-MONAI/tutorials/blob/master/README.md#1-requirements What do you think?
Thanks in advance.
We should add something there and a little warning about Windows behaviour.
Hi @ericspod ,
Would you like to contribute a PR for it? I think you know more details about the platforms.
Thanks.
I'm not really that familiar with the runtime costs of the tutorials, I'm not sure Richard is or not. For the Windows issue I'd just add "Windows users may need to set the num_workers
argument of DataLoader
to 0 if errors are encountered during training."
OK, maybe @wyli knows more details from CI environment.
Thanks.
yes we currently only have basic unit tests for windows. most of the integration, multi-processing, and file system accessing tests are skipped on windows. we should spend more effort on this if there are enough user interests...