ImageCaptioning.pytorch
ImageCaptioning.pytorch copied to clipboard
Make FCModel.py working on CUDA
I tried to run python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
(as said in the book on page 34) with different versions of PyTorch CUDA-enabled builds but all my attempts failed with the following errors. But if I use CPU-only PyTorch builds, then everything works as expected
(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder: ./data
0
listing all images in directory ./data
DataLoaderRaw found 1 images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
File "eval.py", line 132, in <module>
loss, split_predictions, lang_stats = eval_utils.eval_split(
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
return self.sample_beam(fc_feats, att_feats, opt)
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 144, in sample_beam
xt = self.embed(Variable(it, requires_grad=False))
File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
return F.embedding(
File "C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py", line 1814, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select
or
(dl-with-pt) C:\Users\pavel\dev\ImageCaptioning.pytorch>python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ./data
DataLoaderRaw loading images from folder: ./data
0
listing all images in directory ./data
DataLoaderRaw found 1 images
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
C:\Users\pavel\.conda\envs\dl-with-pt\lib\site-packages\torch\nn\functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py:149: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
logprobs = F.log_softmax(self.logit(output))
Traceback (most recent call last):
File "eval.py", line 132, in <module>
loss, split_predictions, lang_stats = eval_utils.eval_split(
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\eval_utils.py", line 106, in eval_split
seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 162, in sample
return self.sample_beam(fc_feats, att_feats, opt)
File "C:\Users\pavel\dev\ImageCaptioning.pytorch\models\FCModel.py", line 143, in sample_beam
it = fc_feats.data.new(beam_size).long().zero_()
RuntimeError: CUDA error: unknown error
I debugged the code and looks like FCModel layers were forgotten to be moved to the device.
Thank you :) I'm on the same book
I have just started working through the book too. So far, struggling in chapter 4 with hot encoding concepts , I have really enjoyed the book so far and the notebook scripts have worked. My system doesn't have a CUDA graphics card and I am sure I am going to want one soon. I have started to search for one and wonder what others like you are using or recommend. The new 3000 series cards are out of my budget but the GTX1660 looks good to me. Should I avoid?
Fixed the issue on my end. Thanks.
@waszee just use https://colab.research.google.com/ and select GPU in Runtime->Change runtime type
Thanks for suggestion and will check out. I am in chapter 5 now. At the end of chapter 4 was some stuff on audio files and tried to write some tensor creations for morse code that I captured from my radio. I am struggling with embedding concepts to handle patterns of different sizes. Makes a good brain teaser for an old guy :).
@elistevens would you mind to take a look at this pr?
fyi I posted a separate query about audio files and DL tensors. I think it is number 9. I am still learning how to jump around and reference stuff already posted.
Fixed the issue on my end. Thanks.
Could you please tell me, how you solve that while running the code on GPU?
Fixed the issue on my end. Thanks.
Could you please tell me, how you solve that while running the code on GPU?
Sorry for not remembering. I’ll try and document solution in next occurrences. Good luck!
In my case, I ran into a slightly different error when running the same command:
─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder: ./data/ 0
listing all images in directory ./data/
DataLoaderRaw found 1 images
/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a f
uture release.
warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
Traceback (most recent call last):
File "/home/joshua/Code/ImageCaptioning.pytorch/eval.py", line 132, in <module>
loss, split_predictions, lang_stats = eval_utils.eval_split(
File "/home/joshua/Code/ImageCaptioning.pytorch/eval_utils.py", line 106, in eval_split
seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 160, in sample
return self.sample_beam(fc_feats, att_feats, opt)
File "/home/joshua/Code/ImageCaptioning.pytorch/models/FCModel.py", line 141, in sample_beam
xt = self.img_embed(fc_feats[k:k+1]).expand(beam_size, self.input_encoding_size)
File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/joshua/Code/machine-learning/venv/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
Rather than force the model to use CPU, I was able to make the following ugly hack (after which it worked correctly); @pbelevich this is similar to your changes:
└─ $ ▶ git diff models/FCModel.py
diff --git a/models/FCModel.py b/models/FCModel.py
index c275b5b..8885e52 100644
--- a/models/FCModel.py
+++ b/models/FCModel.py
@@ -20,8 +20,8 @@ class LSTMCore(nn.Module):
self.drop_prob_lm = opt.drop_prob_lm
# Build a LSTM
- self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size)
- self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size)
+ self.i2h = nn.Linear(self.input_encoding_size, 5 * self.rnn_size, device='cuda:0')
+ self.h2h = nn.Linear(self.rnn_size, 5 * self.rnn_size, device='cuda:0')
self.dropout = nn.Dropout(self.drop_prob_lm)
def forward(self, xt, state):
@@ -59,10 +59,10 @@ class FCModel(CaptionModel):
self.ss_prob = 0.0 # Schedule sampling probability
- self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size)
+ self.img_embed = nn.Linear(self.fc_feat_size, self.input_encoding_size, device='cuda:0')
self.core = LSTMCore(opt)
- self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size)
- self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1)
+ self.embed = nn.Embedding(self.vocab_size + 1, self.input_encoding_size, device='cuda:0')
+ self.logit = nn.Linear(self.rnn_size, self.vocab_size + 1, device='cuda:0')
The script then produces the correct result:
└─ $ ▶ python3 eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_fold ./data/
DataLoaderRaw loading images from folder: ./data/
0
listing all images in directory ./data/
DataLoaderRaw found 1 images
...
image 1: a person riding a horse on a dirt road
evaluating validation preformance... -1/1 (0.000000)
loss: 0.0