ARS agents score not good on exploring

Open ar8372 opened this issue 3 years ago • 1 comments

Hey @colinskow , I have implemented ars.py for the bipedal problem. The score on 1500 iteration is around 330. In each step, in the training loop we explore once using below code

            # Play an episode with the new weights and print the score
            reward_evaluation = self.explore()

Now I have saved the theta at 1500 iterations and also all the other parameter. Next I have initialized the theta with this pretrained theta while creating an instance of Policy() class and explored 10 times, but score is around 6.23 not even closer to 330. Can you tell me why is this happening.

Each time in Explore() we do self.env.reset() so just restarts the env but why reward from the explore function, when called from inside of training loop and manually call explore function is so different.

Let me know if my query is not clear, thanks.

Jun 24 '22 06:06 ar8372

Hello @ar8372 I have the same problem...

When I try to use this in a "production env" it fail and I did the same as you. I saved all the params(n, mean, mean_diff, var) and theta and loaded into another instance of police, but never get the reward that was trained.

Jun 27 '22 13:06 vfsousas