option-critic icon indicating copy to clipboard operation
option-critic copied to clipboard

Doesn't seem converge on the Pong env.

Open tomchen1000 opened this issue 6 years ago • 4 comments

I ran it for >18000 episodes for the Pong environment. The score doesn't converge to high score.

tomchen1000 avatar May 10 '18 17:05 tomchen1000

Hi, thanks, I have noticed similar behaviour before. Would be great if you find a fix. Otherwise, I will try and fix it at first chance.

tdavchev avatar May 14 '18 16:05 tdavchev

Have you find the fix yet?

Wongcheukwai avatar Sep 03 '18 09:09 Wongcheukwai

@yadrimz @tomchen1000

In main.py, I found

    tot_reward = tf.Variable(0.)
    tf.summary.scalar("DOCA/Total Reward", tot_reward)
    cum_reward = tf.Variable(0.)
    tf.summary.scalar("DOCA/Cummulative Reward", tot_reward)

I think tf.summary.scalar("DOCA/Cummulative Reward", tot_reward) should be changed to tf.summary.scalar("DOCA/Cummulative Reward", cum_reward )

GoingMyWay avatar Oct 21 '19 01:10 GoingMyWay

Hello, I tried to run the model with different environments but it not converging at all. Made the above change of cumulative reward but that doesn't help. Any idea for which environment and for how many episodes does this model work ?

aayushee avatar Mar 05 '22 02:03 aayushee