DistRL-TensorFlow2
DistRL-TensorFlow2 copied to clipboard
Bug in the Quantile Huber loss?
Hi,
First of all, thanks for publicly sharing your implementations of the reinforcement learning algorithms. I find your repos very useful!
As I was playing around with the QR-DQN, I think I noticed a bug in your implementation of the Quantile Huber loss function. The code seems to run fine if batch_size == atoms. However, if you change one of the two, you get an error due to the incompatible tensor shapes in line 75 of QR-DQN.py:
loss = tf.where(tf.less(error_loss, 0.0), inv_tau * huber_loss, tau * huber_loss)
I think the error is related to the fact that TF2 implementation of the Huber loss reduces the dimension of the output by 1 with respect to the inputs (docu), even when setting reduction=tf.keras.losses.Reduction.NONE.
This is different from the behavior in TF1, where the output dimension matches the one of the input (docu). Therefore, if I am not mistaken, one could fix this by changing the self.huber_loss
to tf.compat.v1.losses.huber_loss
? I am having a bit of a hard time working out the exact dimensions upon which different operations act, so I would be happy to hear from your side if my theory is correct :P
Could you post the corrected solution here? When training IQN, loss doesn't seem to converge.