ElegantRL icon indicating copy to clipboard operation
ElegantRL copied to clipboard

🐛 Fix bug PER moudle.

Open Yonv1943 opened this issue 2 years ago • 2 comments

After updating the vectorized env and the corresponding multiprocessing training module, support for the PER algorithm has been affected.

Corresponding Pull Request: https://github.com/AI4Finance-Foundation/ElegantRL/pull/269


The related issue is as follows, PER appears nan because multiprocessing is not adapted to PER.

With multiprocessing, there are num_envs * num_workers parallel subenvironments available for a learner to learn. The way I used before, I had a PER sumTree(Binary search tree) with so many subenvironments producing trajectory at the same time, which led to bugs.

After modification, I let each subenvironment output trajectory correspond to one sumTree(Binary search tree). This solves the bug and reduces the size of each sumTree

Yonv1943 avatar Feb 06 '23 01:02 Yonv1943

This fix covers the following.

Agents folder.

  • In AgentXXX.py, in single env and vectorized env mode, make agent.last_state.shape == (num_envs, state_dim) to keep the shape of this tensor consistent.
  • In AgentXXX.py, update agent.get_obj_critic_per( ) for all algorithms to adapt it to the PER algorithm updates.
  • In net,.py, make logprob.shape == (batch_size, ) after summing over the action_dim dimension
  • In evaluator.py, update the import

train folder.

  • In replay_buffer.py, updated PER's sumTree, and the ReplayBuffer that calls sumTree.
  • In run.py, updated the multiprocess module to adapt to PER compatibility under WinOS.
  • In config.py, updated PER related parameters. And adjusted agent.last_state setting for agent.last_state.shape == (num_envs, state_dim)

env folder.

  • In CustomGymEnv, updated the installation code of the simulation environment for checking the PER algorithm.

example folder.

  • Adjusted the off-policy algorithm demo by moving the code for if_use_per=True from demo_DDPG_TD3_SAC.py to demo_PER_Prioritized Experience Replay.py

unit_test folder.

  • update the unit_test file corresponding. (for agent.last_state.shape == (num_envs, state_dim) )

Yonv1943 avatar Feb 06 '23 01:02 Yonv1943

Addition:

  • use states, actions, reawrds instead of state, action, reward as the name of tensor.
  • do not use space as the file name
  • rename get_returns to get_cumulative_rewards

Yonv1943 avatar Feb 06 '23 07:02 Yonv1943