RLSeq2Seq icon indicating copy to clipboard operation
RLSeq2Seq copied to clipboard

Possible shaping error on _add_loss_op() in model.py

Open thefirebanks opened this issue 5 years ago • 1 comments

Hello!

I'm running into a reshaping error when using RL and intermediate rewards.

The output of intermediate_rewards() is a # list of max_dec_step * (batch_size, k)(line 241)

and then this is stacked and has shape (batch_size, k) - stored in self.sampling_discounted_rewards.

But then in _add_loss_op(), you iterate k times and append:

for _ in range(self._hps.k):
    self._sampled_rewards.append(self.sampling_discounted_rewards[:, :, _]) # shape (max_enc_steps, batch_size)

But the index [:, :, _] would run into a dimension error because the shape of self.sampling_discounted_rewards is (batch_size, k).

Am I missing something here? What should be the correct shape/reshaping? Thank you for uploading this code!

thefirebanks avatar Jul 09 '19 21:07 thefirebanks

Possible solution:

Change lines 414 and 427 of attention_decoder.py from

if FLAGS.use_discounted_rewards:

to

if FLAGS.use_discounted_rewards or FLAGS.use_intermediate_rewards:

thefirebanks avatar Jul 10 '19 22:07 thefirebanks