adventures-in-ml-code icon indicating copy to clipboard operation
adventures-in-ml-code copied to clipboard

Bugfix policy gradinet reinforce tf2

Open asokraju opened this issue 5 years ago • 0 comments

Hi,

there is a bg in policy_gradient_reinforce_tf2.py at line 39.

loss = network.train_on_batch(states, discounted_rewards)

to fix this I made two changes,

  1. one_hot_encode the actions one_hot_encode = np.array([[1 if a==i else 0 for i in range(2)] for a in actions])
  2. pass the discounted rewards using 'sample_weight' parameter of 'categorical_crossentropy' loss function

I think it also solves issues #26 #27 #28

I tested it gym.make("CartPole-v0") It converged in 2000 episodes!

Episode: 1961, Reward: 200.0, avg loss: 0.01056
Episode: 1962, Reward: 200.0, avg loss: 0.02165
Episode: 1963, Reward: 200.0, avg loss: -0.04293
Episode: 1964, Reward: 200.0, avg loss: -0.00953
Episode: 1965, Reward: 200.0, avg loss: 0.02787
Episode: 1966, Reward: 200.0, avg loss: 0.00205
Episode: 1967, Reward: 200.0, avg loss: 0.01984
Episode: 1968, Reward: 200.0, avg loss: 0.00307
Episode: 1969, Reward: 200.0, avg loss: -0.03621
Episode: 1970, Reward: 200.0, avg loss: -0.02112
Episode: 1971, Reward: 200.0, avg loss: -0.00132
Episode: 1972, Reward: 200.0, avg loss: 0.02377
Episode: 1973, Reward: 200.0, avg loss: 0.02295
Episode: 1974, Reward: 200.0, avg loss: -0.01884
Episode: 1975, Reward: 200.0, avg loss: 0.02013
Episode: 1976, Reward: 200.0, avg loss: 0.02265
Episode: 1977, Reward: 200.0, avg loss: 0.00097
Episode: 1978, Reward: 200.0, avg loss: -0.03959
Episode: 1979, Reward: 200.0, avg loss: 0.00527
Episode: 1980, Reward: 200.0, avg loss: 0.02360
Episode: 1981, Reward: 200.0, avg loss: 0.03568
Episode: 1982, Reward: 200.0, avg loss: 0.00684
Episode: 1983, Reward: 200.0, avg loss: 0.00912
Episode: 1984, Reward: 200.0, avg loss: -0.03238
Episode: 1985, Reward: 200.0, avg loss: 0.03891
Episode: 1986, Reward: 200.0, avg loss: 0.01156
Episode: 1987, Reward: 200.0, avg loss: 0.04099
Episode: 1988, Reward: 200.0, avg loss: -0.00574
Episode: 1989, Reward: 200.0, avg loss: 0.01317
Episode: 1990, Reward: 200.0, avg loss: 0.00885
Episode: 1991, Reward: 200.0, avg loss: 0.02338
Episode: 1992, Reward: 200.0, avg loss: 0.00069
Episode: 1993, Reward: 200.0, avg loss: 0.01195
Episode: 1994, Reward: 200.0, avg loss: 0.02862
Episode: 1995, Reward: 200.0, avg loss: -0.00214
Episode: 1996, Reward: 200.0, avg loss: 0.01396
Episode: 1997, Reward: 200.0, avg loss: -0.01529
Episode: 1998, Reward: 200.0, avg loss: 0.01859
Episode: 1999, Reward: 200.0, avg loss: 0.02944

asokraju avatar Sep 10 '20 17:09 asokraju