when I modify the number of layers in LSTM and train the train_full_rl.py something wrong;
Start training
Traceback (most recent call last):
File "train_full_rl.py", line 231, in
train(args)
File "train_full_rl.py", line 186, in train
trainer.train()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 193, in train_step
self._stop_reward_fn, self._stop_coeff
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 60, in a2c_train_step
(inds, ms), bs = agent(raw_arts)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 221, in forward
outputs = self._ext(enc_art)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 130, in forward
self._hop_v, self.hop_wq)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 74, in attention
PtrExtractorRL.attention_score(attention, query, v, w), dim=-1)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 66, in attention_score
sum = attention + torch.mm(query, w)
RuntimeError: The size of tensor a (13) must match the size of tensor b (3) at non-singleton dimension 0
I've encountered the same problem... any luck?
Edit:
It seems as though the dimensions of torch.mm(query, w) depends on the number of layers specified.
For example:
- when I set the number of layers to 2, the shape of torch.mm(query, w) was (2, x)
- when I set the number of layers to 1, the shape of torch.mm(query, w) was (1, x) (and this works)
( I did not test for more than 2 layers yet... )
So, it looks like we need to somehow reshape torch.mm(query, w) to be (1, x)?
This might be a bug. I did not test 2 layers since 1 layer already give good results. I will investigate this ASAP.