yolact
yolact copied to clipboard
Pytorch1.10 training show error"Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu"."
how do deal with it: "Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match." appreciate for your answer and way to address it.
Hello, I have the same issue
Windows CUDA 11.3
(yolo_env) C:\User_Data\Instance Segmentation\yolact>python ./train.py --config=yolact_resnet50_philips_config loading annotations into memory... Done (t=0.06s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'lat_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'downsample_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'pred_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " Initializing weights... Traceback (most recent call last): File "C:\User_Data\Instance Segmentation\yolact\train.py", line 503, in
train() File "C:\User_Data\Instance Segmentation\yolact\train.py", line 213, in train yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) File "C:\User_Data\Instance Segmentation\yolact\yolact.py", line 495, in init_weights self.backbone.init_backbone(backbone_path) File "C:\User_Data\Instance Segmentation\yolact\backbone.py", line 143, in init_backbone state_dict = torch.load(path) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 905, in legacy_load return legacy_load(f) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 841, in legacy_load tensor = torch.tensor([], dtype=storage.dtype).set( RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.
I have the same problem with PyTorch 1.11.0+cu113 with CUDA 11.3... I don't know where it comes.
Same error when I try training with resnet50: python ./train.py --config=yolact_resnet50_config
I find a solution to launch a train for that.
First, I use Cuda 11.3 and install this package for torch:
-
torch 1.10.2+cu113
-
torchaudio 0.10.2+cu113
-
torchvision 0.11.3+cu113
You can install all with pip command : pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/torch_stable.html
And next, you need to edit one file (reference here) in site-packages/torch/utils/data/sampler.py
- Modify line 116
generator = torch.Generator()
, change togenerator = torch.Generator(device='cuda')
- Modify line 126
yield from torch.randperm(n, generator=generator).tolist()
, change toyield from torch.randperm(n, generator=generator, device='cuda').tolist()
After that, I I was able to launch python ./train.py --config=yolact_resnet50_myconfig
Hello, I have the same issue
Windows CUDA 11.3
(yolo_env) C:\User_Data\Instance Segmentation\yolact>python ./train.py --config=yolact_resnet50_philips_config loading annotations into memory... Done (t=0.06s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'lat_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'downsample_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'pred_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " Initializing weights... Traceback (most recent call last): File "C:\User_Data\Instance Segmentation\yolact\train.py", line 503, in train() File "C:\User_Data\Instance Segmentation\yolact\train.py", line 213, in train yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) File "C:\User_Data\Instance Segmentation\yolact\yolact.py", line 495, in init_weights self.backbone.init_backbone(backbone_path) File "C:\User_Data\Instance Segmentation\yolact\backbone.py", line 143, in init_backbone state_dict = torch.load(path) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 905, in legacy_load return legacy_load(f) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 841, in legacy_load tensor = torch.tensor([], dtype=storage.dtype).set( RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.
I've solved this issue in this way :
torch.load(path)
----> torch.load(path, map_location='cuda:0'
then appear another error in train ( think line 270), you need to go in line 249 (where data_loader is defined) and change
Shuffle=False
----> Shuffle=False
This allows me to run the code but i don't know why my mAP is stuck at 12 for a long time