ImageCaptioning.pytorch Make FCModel.py working on CUDA

I tried to run python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data(as said in the book on page 34) with different versions of PyTorch CUDA-enabled builds but all my attempts failed with the following errors. But if I use CPU-only PyTorch builds, then everything works as expected

(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder:  ./data
0
listing all images in directory ./data
DataLoaderRaw found  1  images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
  File "eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 144, in sample_beam
    xt = self.embed(Variable(it, requires_grad=False))
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
    return F.embedding(
  File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py", line 1814, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

or

(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder:  ./data
0
listing all images in directory ./data
DataLoaderRaw found  1  images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
  File "eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 143, in sample_beam
    it = fc_feats.data.new(beam_size).long().zero_()
RuntimeError: CUDA error: unknown error

I debugged the code and looks like FCModel layers were forgotten to be moved to the device.

Sep 08 '20 21:09 pbelevich

Thank you :) I'm on the same book

Sep 21 '20 09:09 supersonic71

I have just started working through the book too. So far, struggling in chapter 4 with hot encoding concepts , I have really enjoyed the book so far and the notebook scripts have worked. My system doesn't have a CUDA graphics card and I am sure I am going to want one soon. I have started to search for one and wonder what others like you are using or recommend. The new 3000 series cards are out of my budget but the GTX1660 looks good to me. Should I avoid?

Oct 02 '20 23:10 waszee

Fixed the issue on my end. Thanks.

Oct 03 '20 22:10 jhagege

@waszee just use https://colab.research.google.com/ and select GPU in Runtime->Change runtime type

Oct 06 '20 03:10 pbelevich

Thanks for suggestion and will check out. I am in chapter 5 now. At the end of chapter 4 was some stuff on audio files and tried to write some tensor creations for morse code that I captured from my radio. I am struggling with embedding concepts to handle patterns of different sizes. Makes a good brain teaser for an old guy :).

Oct 06 '20 04:10 waszee

@elistevens would you mind to take a look at this pr?

Oct 06 '20 13:10 pbelevich

fyi I posted a separate query about audio files and DL tensors. I think it is number 9. I am still learning how to jump around and reference stuff already posted.

Oct 06 '20 18:10 waszee

Fixed the issue on my end. Thanks.

Could you please tell me, how you solve that while running the code on GPU?

Jan 13 '22 13:01 kaiser-hamid-rabbi

Fixed the issue on my end. Thanks.

Could you please tell me, how you solve that while running the code on GPU?

Sorry for not remembering. I’ll try and document solution in next occurrences. Good luck!

Jan 13 '22 13:01 jhagege

In my case, I ran into a slightly different error when running the same command:

─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder:  ./data/                                                                                                                                                                                  0             
listing all images in directory ./data/                                                                           
DataLoaderRaw found  1  images                                                                                                                                                                                                      
/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a f
uture release.                                                                                                                                                                                                                      
  warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
Traceback (most recent call last):                                                                                                                                                                                                  
  File "/home/joshua/Code/ImageCaptioning.pytorch/eval.py", line 132, in <module>
    loss, split_predictions, lang_stats = eval_utils.eval_split(                                                                                                                                                                    
  File "/home/joshua/Code/ImageCaptioning.pytorch/eval_utils.py", line 106, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 160, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 141, in sample_beam
    xt = self.img_embed(fc_feats[k:k+1]).expand(beam_size, self.input_encoding_size)
  File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)      
  File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

Rather than force the model to use CPU, I was able to make the following ugly hack (after which it worked correctly); @pbelevich this is similar to your changes:

└─ $ ▶ git diff models/FCModel.py
diff --git a/models/FCModel.py b/models/FCModel.py
index c275b5b..8885e52 100644
--- a/models/FCModel.py
+++ b/models/FCModel.py
@@ -20,8 +20,8 @@ class LSTMCore(nn.Module):
         self.drop_prob_lm = opt.drop_prob_lm
         
         # Build a LSTM
-        self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size)
-        self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size)
+        self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size, device='cuda:0')
+        self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size, device='cuda:0')
         self.dropout = nn.Dropout(self.drop_prob_lm)
 
     def forward(self, xt, state):
@@ -59,10 +59,10 @@ class FCModel(CaptionModel):
 
         self.ss_prob = 0.0 # Schedule sampling probability
 
-        self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size)
+        self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size, device='cuda:0')
         self.core = LSTMCore(opt)
-        self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size)
-        self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1)
+        self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size, device='cuda:0')
+        self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1, device='cuda:0')

The script then produces the correct result:

└─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder:  ./data/
0
listing all images in directory ./data/
DataLoaderRaw found  1  images
...
image 1: a person riding a horse on a dirt road
evaluating validation preformance... -1/1 (0.000000)
loss:  0.0

Mar 11 '22 05:03 jsgoller1

ImageCaptioning.pytorch ImageCaptioning.pytorch copied to clipboard

Make FCModel.py working on CUDA

ImageCaptioning.pytorch
ImageCaptioning.pytorch copied to clipboard