ssd.pytorch
ssd.pytorch copied to clipboard
Expected a 'cuda' device type for generator but found 'cpu'
When I tried to excute train.py, I initially got some directory errors which i fixed quickly by updating the directories in config.py and train.py, but the next error that comes up is as follows:
/content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use 'with torch.no_grad():' instead.
self.priors = Variable(self.priorbox.forward(), volatile=True)
Loading base network...
Initializing weights...
./train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(param)
Loading the dataset...
Training SSD on: VOC0712
Using the specified args:
Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/content/ssd.pytorch/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005)
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked))
Traceback (most recent call last):
File "./train.py", line 255, in <module>
train()
File "./train.py", line 150, in train
batch_iterator = iter(data_loader)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 359, in __iter__
return self._get_iterator()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 944, in __init__
self._reset(loader, first_iter=True)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 975, in _reset
self._try_put_index()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index
index = self._next_index()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 512, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py", line 226, in __iter__
for idx in self.sampler:
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py", line 124, in __iter__
yield from torch.randperm(n, generator=generator).tolist()
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
I'm running this in google colab using GPU.
Same issue here! Running in my notebook using 1 Nvidia GPU, and Spyder IDE.
I found a temporary solution to this problem on the web, it's not perfect but it worked! This solution caused some other troubles down the road, i will try to show how to fix them all.
The solution is based on this assumption: some people claim the problem comes from this line here in train.py:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
I commented the whole paragraph as below:
# #using cuda to speed up computations - a line here below is causing errors!!
# if torch.cuda.is_available():
# if args.cuda:
# torch.set_default_tensor_type('torch.cuda.FloatTensor')
# #this line above is causing errors!!!
# if not args.cuda:
# print("WARNING: It looks like you have a CUDA device, but aren't " +
# "using CUDA.\nRun with --cuda for optimal training speed.")
# torch.set_default_tensor_type('torch.FloatTensor')
# else:
# torch.set_default_tensor_type('torch.FloatTensor')
This problem disappears, but other problems show up.
- Edit file box_utils.py as follows: 1.1) In the beggining of intersect and jaccard function add these lines:
These should fix the problems! Took me the whole day to figure it out, wow!
Other issues will appear about size of tensors but the answer are here and some other issues.
#my added lines due to errors!
if torch.cuda.is_available():
box_a = box_a.cuda()
box_b = box_b.cuda()
1.2) In the beggining of encode function add these lines:
#my added lines due to errors!
if torch.cuda.is_available():
matched = matched.cuda()
priors = priors.cuda()
- Back to train.py, edit the following lines:
Substitute these lines or comment:
loc_loss += loss_l.data[0] #this line gives 0-dim error!
conf_loss += loss_c.data[0] #this line gives 0-dim error!
For:
# loc_loss += loss_l.data[0] #this line gives 0-dim error!
loc_loss += loss_l.data.item() #correction: added.item()
# conf_loss += loss_c.data[0] #this line gives 0-dim error!
conf_loss += loss_c.data.item() #correction: added.item()
and also right below on the printing function:
# print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data[0]), end=' ')
print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')
# update_vis_plot(iteration, loss_l.data[0], loss_c.data[0],
update_vis_plot(iteration, loss_l.data.item(), loss_c.data.item(),
Same issue here! Running in my notebook using 1 Nvidia GPU, and Spyder IDE.
I found a temporary solution to this problem on the web, it's not perfect but it worked! This solution caused some other troubles down the road, i will try to show how to fix them all.
The solution is based on this assumption: some people claim the problem comes from this line here in train.py:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
I commented the whole paragraph as below:
# #using cuda to speed up computations - a line here below is causing errors!! # if torch.cuda.is_available(): # if args.cuda: # torch.set_default_tensor_type('torch.cuda.FloatTensor') # #this line above is causing errors!!! # if not args.cuda: # print("WARNING: It looks like you have a CUDA device, but aren't " + # "using CUDA.\nRun with --cuda for optimal training speed.") # torch.set_default_tensor_type('torch.FloatTensor') # else: # torch.set_default_tensor_type('torch.FloatTensor')
This problem disappears, but other problems show up.
- Edit file box_utils.py as follows: 1.1) In the beggining of intersect and jaccard function add these lines:
These should fix the problems! Took me the whole day to figure it out, wow! Other issues will appear about size of tensors but the answer are here and some other issues. #my added lines due to errors! if torch.cuda.is_available(): box_a = box_a.cuda() box_b = box_b.cuda()
1.2) In the beggining of encode function add these lines:
#my added lines due to errors! if torch.cuda.is_available(): matched = matched.cuda() priors = priors.cuda()
- Back to train.py, edit the following lines: Substitute these lines or comment:
loc_loss += loss_l.data[0] #this line gives 0-dim error!
conf_loss += loss_c.data[0] #this line gives 0-dim error!
For:# loc_loss += loss_l.data[0] #this line gives 0-dim error! loc_loss += loss_l.data.item() #correction: added.item() # conf_loss += loss_c.data[0] #this line gives 0-dim error! conf_loss += loss_c.data.item() #correction: added.item()
and also right below on the printing function:
# print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data[0]), end=' ') print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.data.item()), end=' ')
# update_vis_plot(iteration, loss_l.data[0], loss_c.data[0], update_vis_plot(iteration, loss_l.data.item(), loss_c.data.item(),
I tried all these modifications but i still get the following error:
/content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
self.priors = Variable(self.priorbox.forward(), volatile=True)
Loading base network...
Initializing weights...
./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(param)
Loading the dataset...
Training SSD on: VOC0712
Using the specified args:
Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/content/ssd.pytorch/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005)
/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/content/ssd.pytorch/utils/augmentations.py:238: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
./train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
/usr/local/lib/python3.6/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "./train.py", line 259, in
Ok., you need to debug it one step at a time. For me also it took many hours to debug, but i got it right.
Let's continue from your issues. 1)content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True)
Here is the fix, look at the commented lines, just change your code in ssd.py as here below:
def __init__(self, phase, size, base, extras, head, num_classes):
super(SSD, self).__init__()
self.phase = phase
self.num_classes = num_classes
self.cfg = (coco, voc)[num_classes == 21]
self.priorbox = PriorBox(self.cfg)
#deprecated:
# self.priors = Variable(self.priorbox.forward(), volatile=True)
with torch.no_grad(): #updated version
self.priors = self.priorbox.forward() #updated version
self.size = size
#.... more fixes below, to other issues , still in ssd.py, __init__ :
if phase == 'test':
self.softmax = nn.Softmax(dim=-1)
#ORIGINAL IMPLEMENTATION DEPRECATED
# self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)
self.detect = Detect() #corrected implementation
#my comments
#this 'Detect' Function is not compatible with new Pytorch version,
#generates error 'Legacy autograd function with non-static forward
#method is deprecated. Please use new-style autograd function with
#static forward method.'
#Correction is implemented by passing above arguments directly to
#command .apply() at the forward method below.
Here is another fix still in ssd.py, in the forward method:
if self.phase == "test":
#ORIGINAL LINE IS DEPRECATED
# output = self.detect(
#corrected implementation:
output = self.detect.apply(self.num_classes, 0, 200, 0.01, 0.45,
# loc preds
loc.view(loc.size(0), -1, 4),
# conf preds
self.softmax(conf.view(conf.size(0), -1, self.num_classes)),
# default boxes
self.priors.type(type(x.data))
)
2)./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param)
Simple fix, just correct the function call as below on train.py:
def xavier(param):
init.xavier_uniform_(param)
3)/train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
Just replace your code in train.py as the lines below (read the commented lines to understand):
#send to device
if args.cuda:
#my modifications/additions
images = images.cuda()
with torch.no_grad():
targets = [ann.cuda() for ann in targets]
# ORIGINAL LINES (DEPRECATED)
# images = Variable(images.cuda())
# targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
# else:
# # ORIGINAL LINES (DEPRECATED)
# images = Variable(images)
# targets = [Variable(ann, volatile=True) for ann in targets]
4)loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0
I found this fix below here on github somewhere, there is a thread about it. Anyway here is the fix, just add one line before as shown below and voilá. :
```
Hard Negative Mining
loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #added line to fix
#--------------------------------------------------------------------
loss_c[pos] = 0 # filter out pos boxes for now
More chewed than that, impossible :)
I hope this help could help you suffer less than I did 😂
good luck!!!
Ok., you need to debug it one step at a time. For me also it took many hours to debug, but i got it right.
Let's continue from your issues. 1)content/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True)
Here is the fix, look at the commented lines, just change your code in ssd.py as here below:
def __init__(self, phase, size, base, extras, head, num_classes): super(SSD, self).__init__() self.phase = phase self.num_classes = num_classes self.cfg = (coco, voc)[num_classes == 21] self.priorbox = PriorBox(self.cfg) #deprecated: # self.priors = Variable(self.priorbox.forward(), volatile=True) with torch.no_grad(): #updated version self.priors = self.priorbox.forward() #updated version self.size = size #.... more fixes below, to other issues , still in ssd.py, __init__ : if phase == 'test': self.softmax = nn.Softmax(dim=-1) #ORIGINAL IMPLEMENTATION DEPRECATED # self.detect = Detect(num_classes, 0, 200, 0.01, 0.45) self.detect = Detect() #corrected implementation #my comments #this 'Detect' Function is not compatible with new Pytorch version, #generates error 'Legacy autograd function with non-static forward #method is deprecated. Please use new-style autograd function with #static forward method.' #Correction is implemented by passing above arguments directly to #command .apply() at the forward method below.
Here is another fix still in ssd.py, in the forward method:
if self.phase == "test": #ORIGINAL LINE IS DEPRECATED # output = self.detect( #corrected implementation: output = self.detect.apply(self.num_classes, 0, 200, 0.01, 0.45, # loc preds loc.view(loc.size(0), -1, 4), # conf preds self.softmax(conf.view(conf.size(0), -1, self.num_classes)), # default boxes self.priors.type(type(x.data)) )
2)./train.py:218: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param)
Simple fix, just correct the function call as below on train.py:
def xavier(param): init.xavier_uniform_(param)
3)/train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
Just replace your code in train.py as the lines below (read the commented lines to understand):
#send to device if args.cuda: #my modifications/additions images = images.cuda() with torch.no_grad(): targets = [ann.cuda() for ann in targets] # ORIGINAL LINES (DEPRECATED) # images = Variable(images.cuda()) # targets = [Variable(ann.cuda(), volatile=True) for ann in targets] # else: # # ORIGINAL LINES (DEPRECATED) # images = Variable(images) # targets = [Variable(ann, volatile=True) for ann in targets]
4)loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0
I found this fix below here on github somewhere, there is a thread about it. Anyway here is the fix, just add one line before as shown below and voilá. :
```
Hard Negative Mining
loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #added line to fix #-------------------------------------------------------------------- loss_c[pos] = 0 # filter out pos boxes for now
More chewed than that, impossible :) I hope this help could help you suffer less than I did 😂 good luck!!!
Thanks alot, it finally worked.
Turning the shuffle parameter off in the dataloader helped. Link