segmentation_models.pytorch
segmentation_models.pytorch copied to clipboard
Error running binary_segmentation_intro.ipynb
On sagemaker studio, ml.g4dn.xlarge instance and pytorch 1.10 kernel, the notebook raises an error at trainer.fit:
/opt/conda/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
432 # instantiate since we don't know how to
433 raise RuntimeError(msg) from None
--> 434 raise exception
435
436
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 295, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/auto_restart.py", line 474, in _capture_metadata_collate
data = default_collate(samples)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp>
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 64, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 54, in default_collate
storage = elem.storage()._new_shared(numel)
File "/opt/conda/lib/python3.8/site-packages/torch/storage.py", line 157, in _new_shared
return cls._new_using_fd(size)
RuntimeError: falseINTERNAL ASSERT FAILED at "/codebuild/output/src741569495/src/aten/src/ATen/MapAllocator.cpp":300, please report a bug to PyTorch. unable to write to file </torch_1110_0>
Appears to be https://github.com/pytorch/pytorch/issues/68501
Now forcing upgrade from 1.10.2+cu113
Appears some conflict:
HorovodVersionMismatchError: Framework pytorch installed with version 1.10.2+cu113 but found version 1.12.1+cu102.
This can result in unexpected behavior including runtime errors.
Reinstall Horovod using `pip install --no-cache-dir` to build with the new version.
Getting this Horovod error even with this recipe:
!pip install torch --upgrade
!pip install segmentation-models-pytorch
!pip install pytorch-lightning==1.5.4
!pip install --no-cache-dir horovod[pytorch]
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.