DeepRL-Agents icon indicating copy to clipboard operation
DeepRL-Agents copied to clipboard

A3C Basic Doom: effect of episode length (Discuss)

Open IbrahimSobh opened this issue 8 years ago • 0 comments

Hi

This is to discuss how the episode length may affect the learning process.

Case 1: The default as in the repo

Smoothed steady Reward is around 0.55 (see figure below)

game.set_episode_timeout(300) doom_basic_all

Case 2: Shorter episode

game.set_episode_timeout(150)

Very similar to Case 1

doom_basic_episode_150

Case 3: very short episode

game.set_episode_timeout(70)

The agent should find the policy fast because it has very limited time window to explore. Delayed convergence (after 500 episodes)

  • However reward is around 0.65 > case 1 (0.55) (see figure below) Why? convergence is delayed, but on the other hand, we have better rewards. I mean that, the agent usually accomplish the task in less time compared to case 1, or the agent is more efficient and focused compared to case 1) What do you think?!

doom_basic_episode_70

Case 4: Longer episode

game.set_episode_timeout(450)

smoothed reward is around 0.62 smoothed length is around 33 delayed convergence compared to case 1

doom_basic_episode_450

Case 5: each worker has its own length

Is it even a valid idea?!

Where: episode length = 75 + (number *25)

worker_0: episode length = 75 worker_1: episode length = 100 worker_2: episode length = 125 . worker_7: episode length = 250

It seems that worker_0 with episode length 250 converged faster than worker_7 with episode length 75

The following figure includes all workers:
doom_basic_episode_75_to_250_all

The following figure includes only worker_0 (episode length = 75) and worker_7 (episode length = 250) doom_basic_episode_75_to_250_two

However, all workers share the same global network, Do you think by having different episode lengths, could affect / enhance the learning? What to you think?

Again: Is it even a valid idea?!

Case 6: each worker has its own length, with lager range

Where: episode length = 100 + (number *50)

worker_0: episode length = 100 worker_1: episode length = 150 worker_2: episode length = 200 . worker_7: episode length = 450

The following figure includes all workers:
doom_basic_episode_100_to_450_all

The following figure includes only worker_0 (episode length = 100), worker_4 (episode length = 300) worker_7 (episode length = 450)

doom_basic_episode_100_to_450_three

The longer the episode, the faster the learning

IbrahimSobh avatar Mar 18 '17 16:03 IbrahimSobh