option-critic
option-critic copied to clipboard
Doesn't seem converge on the Pong env.
I ran it for >18000 episodes for the Pong environment. The score doesn't converge to high score.
Hi, thanks, I have noticed similar behaviour before. Would be great if you find a fix. Otherwise, I will try and fix it at first chance.
Have you find the fix yet?
@yadrimz @tomchen1000
In main.py
, I found
tot_reward = tf.Variable(0.)
tf.summary.scalar("DOCA/Total Reward", tot_reward)
cum_reward = tf.Variable(0.)
tf.summary.scalar("DOCA/Cummulative Reward", tot_reward)
I think tf.summary.scalar("DOCA/Cummulative Reward", tot_reward)
should be changed to
tf.summary.scalar("DOCA/Cummulative Reward", cum_reward )
Hello, I tried to run the model with different environments but it not converging at all. Made the above change of cumulative reward but that doesn't help. Any idea for which environment and for how many episodes does this model work ?