yolact icon indicating copy to clipboard operation
yolact copied to clipboard

Pytorch1.10 training show error"Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu"."

Open lixubo-xupt opened this issue 2 years ago • 4 comments

how do deal with it: "Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match." appreciate for your answer and way to address it.

lixubo-xupt avatar May 14 '22 15:05 lixubo-xupt

Hello, I have the same issue

Windows CUDA 11.3

(yolo_env) C:\User_Data\Instance Segmentation\yolact>python ./train.py --config=yolact_resnet50_philips_config loading annotations into memory... Done (t=0.06s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'lat_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'downsample_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'pred_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " Initializing weights... Traceback (most recent call last): File "C:\User_Data\Instance Segmentation\yolact\train.py", line 503, in train() File "C:\User_Data\Instance Segmentation\yolact\train.py", line 213, in train yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) File "C:\User_Data\Instance Segmentation\yolact\yolact.py", line 495, in init_weights self.backbone.init_backbone(backbone_path) File "C:\User_Data\Instance Segmentation\yolact\backbone.py", line 143, in init_backbone state_dict = torch.load(path) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 905, in legacy_load return legacy_load(f) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 841, in legacy_load tensor = torch.tensor([], dtype=storage.dtype).set( RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

Nilesh-Hampiholi avatar May 20 '22 09:05 Nilesh-Hampiholi

I have the same problem with PyTorch 1.11.0+cu113 with CUDA 11.3... I don't know where it comes.

Same error when I try training with resnet50: python ./train.py --config=yolact_resnet50_config

cocofaivre avatar May 26 '22 09:05 cocofaivre

I find a solution to launch a train for that.

First, I use Cuda 11.3 and install this package for torch:

  • torch 1.10.2+cu113
  • torchaudio 0.10.2+cu113
  • torchvision 0.11.3+cu113

You can install all with pip command : pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/torch_stable.html

And next, you need to edit one file (reference here) in site-packages/torch/utils/data/sampler.py

  • Modify line 116 generator = torch.Generator(), change to generator = torch.Generator(device='cuda')
  • Modify line 126 yield from torch.randperm(n, generator=generator).tolist(), change to yield from torch.randperm(n, generator=generator, device='cuda').tolist()

After that, I I was able to launch python ./train.py --config=yolact_resnet50_myconfig

cocofaivre avatar May 26 '22 12:05 cocofaivre

Hello, I have the same issue

Windows CUDA 11.3

(yolo_env) C:\User_Data\Instance Segmentation\yolact>python ./train.py --config=yolact_resnet50_philips_config loading annotations into memory... Done (t=0.06s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'lat_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'downsample_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\jit_recursive.py:234: UserWarning: 'pred_layers' was found in ScriptModule constants, but it is a non-constant submodule. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " Initializing weights... Traceback (most recent call last): File "C:\User_Data\Instance Segmentation\yolact\train.py", line 503, in train() File "C:\User_Data\Instance Segmentation\yolact\train.py", line 213, in train yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) File "C:\User_Data\Instance Segmentation\yolact\yolact.py", line 495, in init_weights self.backbone.init_backbone(backbone_path) File "C:\User_Data\Instance Segmentation\yolact\backbone.py", line 143, in init_backbone state_dict = torch.load(path) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 905, in legacy_load return legacy_load(f) File "C:\User_Data\Instance Segmentation\yolo_env\lib\site-packages\torch\serialization.py", line 841, in legacy_load tensor = torch.tensor([], dtype=storage.dtype).set( RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

I've solved this issue in this way : torch.load(path) ----> torch.load(path, map_location='cuda:0'

then appear another error in train ( think line 270), you need to go in line 249 (where data_loader is defined) and change Shuffle=False ----> Shuffle=False

This allows me to run the code but i don't know why my mAP is stuck at 12 for a long time

mariocorradetti avatar May 31 '22 09:05 mariocorradetti