option-critic Doesn't seem converge on the Pong env.

Doesn't seem converge on the Pong env.

Open tomchen1000 opened this issue 6 years ago • 4 comments

I ran it for >18000 episodes for the Pong environment. The score doesn't converge to high score.

May 10 '18 17:05 tomchen1000

Hi, thanks, I have noticed similar behaviour before. Would be great if you find a fix. Otherwise, I will try and fix it at first chance.

May 14 '18 16:05 tdavchev

Have you find the fix yet?

Sep 03 '18 09:09 Wongcheukwai

@yadrimz @tomchen1000

In main.py, I found

    tot_reward = tf.Variable(0.)
    tf.summary.scalar("DOCA/Total Reward", tot_reward)
    cum_reward = tf.Variable(0.)
    tf.summary.scalar("DOCA/Cummulative Reward", tot_reward)

I think tf.summary.scalar("DOCA/Cummulative Reward", tot_reward) should be changed to tf.summary.scalar("DOCA/Cummulative Reward", cum_reward )

Oct 21 '19 01:10 GoingMyWay

Hello, I tried to run the model with different environments but it not converging at all. Made the above change of cumulative reward but that doesn't help. Any idea for which environment and for how many episodes does this model work ?

Mar 05 '22 02:03 aayushee

option-critic option-critic copied to clipboard

Doesn't seem converge on the Pong env.

option-critic
option-critic copied to clipboard