ElegantRL
ElegantRL copied to clipboard
🐛 Fix bug PER moudle.
After updating the vectorized env and the corresponding multiprocessing training module, support for the PER algorithm has been affected.
Corresponding Pull Request: https://github.com/AI4Finance-Foundation/ElegantRL/pull/269
The related issue is as follows, PER appears nan because multiprocessing is not adapted to PER.
With multiprocessing, there are num_envs * num_workers
parallel subenvironments available for a learner to learn. The way I used before, I had a PER sumTree(Binary search tree) with so many subenvironments producing trajectory at the same time, which led to bugs.
After modification, I let each subenvironment output trajectory correspond to one sumTree(Binary search tree). This solves the bug and reduces the size of each sumTree
This fix covers the following.
Agents folder.
- In AgentXXX.py, in single env and vectorized env mode, make
agent.last_state.shape == (num_envs, state_dim)
to keep the shape of this tensor consistent. - In AgentXXX.py, update
agent.get_obj_critic_per( )
for all algorithms to adapt it to the PER algorithm updates. - In net,.py, make
logprob.shape == (batch_size, )
after summing over the action_dim dimension - In evaluator.py, update the import
train folder.
- In replay_buffer.py, updated PER's sumTree, and the ReplayBuffer that calls sumTree.
- In run.py, updated the multiprocess module to adapt to PER compatibility under WinOS.
- In config.py, updated PER related parameters. And adjusted agent.last_state setting for
agent.last_state.shape == (num_envs, state_dim)
env folder.
- In CustomGymEnv, updated the installation code of the simulation environment for checking the PER algorithm.
example folder.
- Adjusted the off-policy algorithm demo by moving the code for
if_use_per=True
fromdemo_DDPG_TD3_SAC.py
todemo_PER_Prioritized Experience Replay.py
unit_test folder.
- update the unit_test file corresponding. (for
agent.last_state.shape == (num_envs, state_dim)
)
Addition:
- use
states, actions, reawrds
instead ofstate, action, reward
as the name of tensor. - do not use space as the file name
- rename
get_returns
toget_cumulative_rewards