Can't pickle local object while training RandLANet on S3DIS. I use pytorch.
Steps to reproduce the bug
import open3d.ml as _ml3d
import open3d.ml.torch as ml3d
model = ml3d.models.RandLANet()
dataset_path = "/Users/kimd999/research/projects/Danny/files/public_dataset/S3DIS/Stanford3dDataset_v1.2_Aligned_Version"
dataset = ml3d.datasets.S3DIS(dataset_path=dataset_path, use_cache=True)
pipeline = ml3d.pipelines.SemanticSegmentation(model=model, dataset=dataset, max_epoch=100)
# prints training progress in the console.
pipeline.run_train()
Error message
INFO - 2022-02-15 13:24:10,927 - semantic_segmentation - DEVICE : cpu
INFO - 2022-02-15 13:24:10,927 - semantic_segmentation - Logging in file : ./logs/RandLANet_S3DIS_torch/log_train_2022-02-15_13:24:10.txt
INFO - 2022-02-15 13:24:10,929 - s3dis - Found 249 pointclouds for train
INFO - 2022-02-15 13:24:10,935 - s3dis - Found 23 pointclouds for validation
INFO - 2022-02-15 13:24:10,937 - semantic_segmentation - Initializing from scratch.
INFO - 2022-02-15 13:24:10,940 - semantic_segmentation - Writing summary in train_log/00003_RandLANet_S3DIS_torch.
INFO - 2022-02-15 13:24:10,940 - semantic_segmentation - Started training
INFO - 2022-02-15 13:24:10,940 - semantic_segmentation - === EPOCH 0/100 ===
training: 0%| | 0/63 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_model_for_semantic_segmentation.py", line 19, in
pipeline.run_train()
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 394, in run_train
for step, inputs in enumerate(tqdm(train_loader, desc='training')):
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 355, in iter
return self._get_iterator()
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 914, in init
w.start()
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/kimd999/bin/miniconda3/envs/open3d/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen'
Expected behavior
No response
Open3D, Python and System information
- Operating system: OSX 10.15.7
- Python version: Python 3.8.12
- Open3D version: open3d version:0.14.1
- System type: x84
- Is this remote workstation?: no
- How did you install Open3D?: pip install open3d
@kimdn Seems that this was caused somehow by num_workers in pytorch, see this thread: https://github.com/pyg-team/pytorch_geometric/issues/366.
Try setting num_workers=0 in your pipeline definition like so:
pipeline = ml3d.pipelines.SemanticSegmentation(model=model, dataset=dataset, max_epoch=100, num_workers=0)
I guess it is not a great solution if you intend to have num_workers > 0, but hopefully it will at least resolve the error message!
I used WSL ubuntu to train the models. Num_workers > 0 worked for RandLA-Net but KPConv, which was very strange.
But at least it proved that multiprocessing could work in this virtual environment. Do you have any ideas about the difference in the model deployments?
I have set num_workers to 0, but I still met this bug. Do you know how to solve?
python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_toronto3d.yml --dataset.dataset_path dataset/Toronto_3D --pipeline SemanticSegmentation --dataset.use_cache True --num_workers 0
INFO - 2022-12-09 17:31:29,220 - semantic_segmentation - === EPOCH 0/200 ===
training: 0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 246, in
sys.exit(main())
File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 180, in main
pipeline.run_train()
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 406, in run_train
for step, inputs in enumerate(tqdm(train_loader, desc='training')):
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in iter
return self._get_iterator()
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in init
w.start()
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in init
super().init(process_obj)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch
reduction.dump(process_obj, buf)
File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen'
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Hi, I find a solution. Just add "num_workers:0 pin_memory: false" belong to "pipeline" in ".yaml " config file. Solution link https://blog.csdn.net/weixin_40653140/article/details/130492849