ssd.pytorch icon indicating copy to clipboard operation
ssd.pytorch copied to clipboard

Expected a 'cuda' device type for generator but found 'cpu'

Open AchrafSd opened this issue 3 years ago • 5 comments

When I tried to excute train.py, I initially got some directory errors which i fixed quickly by updating the directories in config.py and train.py, but the next error that comes up is as follows:

/content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use 'with torch.no_grad():' instead. self.priors = Variable(self.priorbox.forward(), volatile=True) Loading base network... Initializing weights... ./train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param) Loading the dataset... Training SSD on: VOC0712 Using the specified args: Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/content/ssd.pytorch/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005) /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) Traceback (most recent call last): File "./train.py", line 255, in <module> train() File "./train.py", line 150, in train batch_iterator = iter(data_loader) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 359, in __iter__ return self._get_iterator() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 944, in __init__ self._reset(loader, first_iter=True) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 975, in _reset self._try_put_index() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index index = self._next_index() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 512, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py", line 226, in __iter__ for idx in self.sampler: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py", line 124, in __iter__ yield from torch.randperm(n, generator=generator).tolist() RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

I'm running this in google colab using GPU.

AchrafSd avatar Sep 30 '21 16:09 AchrafSd

Same issue here! Running in my notebook using 1 Nvidia GPU, and Spyder IDE.

I found a temporary solution to this problem on the web, it's not perfect but it worked! This solution caused some other troubles down the road, i will try to show how to fix them all.

The solution is based on this assumption: some people claim the problem comes from this line here in train.py: torch.set_default_tensor_type('torch.cuda.FloatTensor')

I commented the whole paragraph as below:

# #using cuda to speed up computations - a line here below is causing errors!!
# if torch.cuda.is_available():
#     if args.cuda:
#         torch.set_default_tensor_type('torch.cuda.FloatTensor')
#         #this line above is causing errors!!!
#     if not args.cuda:
#         print("WARNING: It looks like you have a CUDA device, but aren't " +
#               "using CUDA.\nRun with --cuda for optimal training speed.")
#         torch.set_default_tensor_type('torch.FloatTensor')
# else:
#     torch.set_default_tensor_type('torch.FloatTensor')

This problem disappears, but other problems show up.

  1. Edit file box_utils.py as follows: 1.1) In the beggining of intersect and jaccard function add these lines:

These should fix the problems! Took me the whole day to figure it out, wow!
Other issues will appear about size of tensors but the answer are here and some other issues.
#my added lines due to errors!
    if torch.cuda.is_available():
        box_a = box_a.cuda()
        box_b = box_b.cuda()

1.2) In the beggining of encode function add these lines:

 #my added lines due to errors!
    if torch.cuda.is_available():
        matched = matched.cuda()
        priors = priors.cuda()

  1. Back to train.py, edit the following lines: Substitute these lines or comment: loc_loss += loss_l.data[0] #this line gives 0-dim error! conf_loss += loss_c.data[0] #this line gives 0-dim error! For:
#  loc_loss += loss_l.data[0]    #this line gives 0-dim error!
    loc_loss += loss_l.data.item()  #correction: added.item()
 # conf_loss += loss_c.data[0]   #this line gives 0-dim error!
    conf_loss += loss_c.data.item()     #correction: added.item()

and also right below on the printing function:

# print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data[0]), end=' ')
print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')
# update_vis_plot(iteration, loss_l.data[0], loss_c.data[0],
update_vis_plot(iteration, loss_l.data.item(), loss_c.data.item(),

ronichester avatar Oct 04 '21 19:10 ronichester

Same issue here! Running in my notebook using 1 Nvidia GPU, and Spyder IDE.

I found a temporary solution to this problem on the web, it's not perfect but it worked! This solution caused some other troubles down the road, i will try to show how to fix them all.

The solution is based on this assumption: some people claim the problem comes from this line here in train.py: torch.set_default_tensor_type('torch.cuda.FloatTensor')

I commented the whole paragraph as below:

# #using cuda to speed up computations - a line here below is causing errors!!
# if torch.cuda.is_available():
#     if args.cuda:
#         torch.set_default_tensor_type('torch.cuda.FloatTensor')
#         #this line above is causing errors!!!
#     if not args.cuda:
#         print("WARNING: It looks like you have a CUDA device, but aren't " +
#               "using CUDA.\nRun with --cuda for optimal training speed.")
#         torch.set_default_tensor_type('torch.FloatTensor')
# else:
#     torch.set_default_tensor_type('torch.FloatTensor')

This problem disappears, but other problems show up.

  1. Edit file box_utils.py as follows: 1.1) In the beggining of intersect and jaccard function add these lines:

These should fix the problems! Took me the whole day to figure it out, wow!
Other issues will appear about size of tensors but the answer are here and some other issues.
#my added lines due to errors!
    if torch.cuda.is_available():
        box_a = box_a.cuda()
        box_b = box_b.cuda()

1.2) In the beggining of encode function add these lines:

 #my added lines due to errors!
    if torch.cuda.is_available():
        matched = matched.cuda()
        priors = priors.cuda()
  1. Back to train.py, edit the following lines: Substitute these lines or comment: loc_loss += loss_l.data[0] #this line gives 0-dim error! conf_loss += loss_c.data[0] #this line gives 0-dim error! For:
#  loc_loss += loss_l.data[0]    #this line gives 0-dim error!
    loc_loss += loss_l.data.item()  #correction: added.item()
 # conf_loss += loss_c.data[0]   #this line gives 0-dim error!
    conf_loss += loss_c.data.item()     #correction: added.item()

and also right below on the printing function:

# print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data[0]), end=' ')
print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')
# update_vis_plot(iteration, loss_l.data[0], loss_c.data[0],
update_vis_plot(iteration, loss_l.data.item(), loss_c.data.item(),

I tried all these modifications but i still get the following error:

/content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True) Loading base network... Initializing weights... ./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param) Loading the dataset... Training SSD on: VOC0712 Using the specified args: Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/content/ssd.pytorch/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005) /usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) /content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray mode = random.choice(self.sample_options) /content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray mode = random.choice(self.sample_options) /content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray mode = random.choice(self.sample_options) /content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray mode = random.choice(self.sample_options) ./train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets] /usr/local/lib/python3.6/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): File "./train.py", line 259, in train() File "./train.py", line 178, in train loss_l, loss_c = criterion(out, targets) File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/ssd.pytorch/layers/modules/multibox_loss.py", line 97, in forward loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

AchrafSd avatar Oct 25 '21 09:10 AchrafSd

Ok., you need to debug it one step at a time. For me also it took many hours to debug, but i got it right.

Let's continue from your issues. 1)content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True)

Here is the fix, look at the commented lines, just change your code in ssd.py as here below:

def __init__(self, phase, size, base, extras, head, num_classes):
        super(SSD, self).__init__()
        self.phase = phase
        self.num_classes = num_classes
        self.cfg = (coco, voc)[num_classes == 21]
        self.priorbox = PriorBox(self.cfg)
        #deprecated:
        # self.priors = Variable(self.priorbox.forward(), volatile=True)
        with torch.no_grad():                      #updated version
            self.priors = self.priorbox.forward()  #updated version
        self.size = size

#.... more fixes below, to other issues , still in ssd.py, __init__ :

if phase == 'test':
            self.softmax = nn.Softmax(dim=-1)
            #ORIGINAL IMPLEMENTATION DEPRECATED
            # self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)
            self.detect = Detect()  #corrected implementation
            #my comments
            #this 'Detect' Function is not compatible with new Pytorch version,
            #generates error 'Legacy autograd function with non-static forward 
            #method is deprecated. Please use new-style autograd function with
            #static forward method.'
            #Correction is implemented by passing above arguments directly to
            #command .apply() at the forward method below.

Here is another fix still in ssd.py, in the forward method:

 if self.phase == "test":
            #ORIGINAL LINE IS DEPRECATED
            # output = self.detect(
            #corrected implementation:
            output = self.detect.apply(self.num_classes, 0, 200, 0.01, 0.45,
                # loc preds
                loc.view(loc.size(0), -1, 4),
                # conf preds
                self.softmax(conf.view(conf.size(0), -1, self.num_classes)),
                # default boxes
                self.priors.type(type(x.data))                  
            )

2)./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param)

Simple fix, just correct the function call as below on train.py:

def xavier(param):
    init.xavier_uniform_(param)

3)/train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets]

Just replace your code in train.py as the lines below (read the commented lines to understand):

 #send to device
        if args.cuda:
            #my modifications/additions
            images = images.cuda()
            with torch.no_grad():
                targets = [ann.cuda() for ann in targets]
            # ORIGINAL LINES (DEPRECATED)
            # images = Variable(images.cuda())
            # targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
        # else:
        #     # ORIGINAL LINES (DEPRECATED)
        #     images = Variable(images)
        #     targets = [Variable(ann, volatile=True) for ann in targets]

4)loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

I found this fix below here on github somewhere, there is a thread about it. Anyway here is the fix, just add one line before as shown below and voilá. :

    ```

Hard Negative Mining

    loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #added line to fix
    #--------------------------------------------------------------------
    loss_c[pos] = 0  # filter out pos boxes for now

More chewed than that, impossible :)
I hope this help could help you suffer less than I did 😂 
good luck!!!

ronichester avatar Oct 25 '21 16:10 ronichester

Ok., you need to debug it one step at a time. For me also it took many hours to debug, but i got it right.

Let's continue from your issues. 1)content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True)

Here is the fix, look at the commented lines, just change your code in ssd.py as here below:

def __init__(self, phase, size, base, extras, head, num_classes):
        super(SSD, self).__init__()
        self.phase = phase
        self.num_classes = num_classes
        self.cfg = (coco, voc)[num_classes == 21]
        self.priorbox = PriorBox(self.cfg)
        #deprecated:
        # self.priors = Variable(self.priorbox.forward(), volatile=True)
        with torch.no_grad():                      #updated version
            self.priors = self.priorbox.forward()  #updated version
        self.size = size

#.... more fixes below, to other issues , still in ssd.py, __init__ :

if phase == 'test':
            self.softmax = nn.Softmax(dim=-1)
            #ORIGINAL IMPLEMENTATION DEPRECATED
            # self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)
            self.detect = Detect()  #corrected implementation
            #my comments
            #this 'Detect' Function is not compatible with new Pytorch version,
            #generates error 'Legacy autograd function with non-static forward 
            #method is deprecated. Please use new-style autograd function with
            #static forward method.'
            #Correction is implemented by passing above arguments directly to
            #command .apply() at the forward method below.

Here is another fix still in ssd.py, in the forward method:

 if self.phase == "test":
            #ORIGINAL LINE IS DEPRECATED
            # output = self.detect(
            #corrected implementation:
            output = self.detect.apply(self.num_classes, 0, 200, 0.01, 0.45,
                # loc preds
                loc.view(loc.size(0), -1, 4),
                # conf preds
                self.softmax(conf.view(conf.size(0), -1, self.num_classes)),
                # default boxes
                self.priors.type(type(x.data))                  
            )

2)./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param)

Simple fix, just correct the function call as below on train.py:

def xavier(param):
    init.xavier_uniform_(param)

3)/train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets]

Just replace your code in train.py as the lines below (read the commented lines to understand):

 #send to device
        if args.cuda:
            #my modifications/additions
            images = images.cuda()
            with torch.no_grad():
                targets = [ann.cuda() for ann in targets]
            # ORIGINAL LINES (DEPRECATED)
            # images = Variable(images.cuda())
            # targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
        # else:
        #     # ORIGINAL LINES (DEPRECATED)
        #     images = Variable(images)
        #     targets = [Variable(ann, volatile=True) for ann in targets]

4)loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

I found this fix below here on github somewhere, there is a thread about it. Anyway here is the fix, just add one line before as shown below and voilá. :

    ```

Hard Negative Mining

    loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #added line to fix
    #--------------------------------------------------------------------
    loss_c[pos] = 0  # filter out pos boxes for now

More chewed than that, impossible :)
I hope this help could help you suffer less than I did 😂 
good luck!!!

Thanks alot, it finally worked.

AchrafSd avatar Oct 26 '21 08:10 AchrafSd

Turning the shuffle parameter off in the dataloader helped. Link

Aayushktyagi avatar May 23 '22 14:05 Aayushktyagi