RLSeq2Seq
RLSeq2Seq copied to clipboard
Possible shaping error on _add_loss_op() in model.py
Hello!
I'm running into a reshaping error when using RL and intermediate rewards.
The output of intermediate_rewards()
is a # list of max_dec_step * (batch_size, k)
(line 241)
and then this is stacked and has shape (batch_size, k)
- stored in self.sampling_discounted_rewards
.
But then in _add_loss_op()
, you iterate k times and append:
for _ in range(self._hps.k):
self._sampled_rewards.append(self.sampling_discounted_rewards[:, :, _]) # shape (max_enc_steps, batch_size)
But the index [:, :, _] would run into a dimension error because the shape of self.sampling_discounted_rewards
is (batch_size, k)
.
Am I missing something here? What should be the correct shape/reshaping? Thank you for uploading this code!
Possible solution:
Change lines 414 and 427 of attention_decoder.py from
if FLAGS.use_discounted_rewards:
to
if FLAGS.use_discounted_rewards or FLAGS.use_intermediate_rewards: