learning_ray Not use the updated policy

Not use the updated policy

Open zhiqiwangebay opened this issue 1 year ago • 0 comments

In ch_03 https://github.com/maxpumperla/learning_ray/blob/main/notebooks/ch_03_core_app.ipynb train_policy_parallel function,

def train_policy_parallel(env, num_episodes=1000, num_simulations=4):
    """Parallel policy training function."""
    policy = Policy(env)
    simulations = [SimulationActor.remote() for _ in range(num_simulations)]

    policy_ref = ray.put(policy)
    for _ in range(num_episodes):
        experiences = [sim.rollout.remote(policy_ref) for sim in simulations]

        while len(experiences) > 0:
            finished, experiences = ray.wait(experiences)
            for xp in ray.get(finished):
                update_policy(policy, xp)

    return policy

If i'm not mistaken, it appears that each episode use the initially initialized policy rather than the updated one

Dec 20 '23 13:12 zhiqiwangebay

learning_ray learning_ray copied to clipboard

Not use the updated policy

learning_ray
learning_ray copied to clipboard