ElegantRL 🐛 Fix bug PER moudle.

After updating the vectorized env and the corresponding multiprocessing training module, support for the PER algorithm has been affected.

Corresponding Pull Request: https://github.com/AI4Finance-Foundation/ElegantRL/pull/269

The related issue is as follows, PER appears nan because multiprocessing is not adapted to PER.

With multiprocessing, there are num_envs * num_workers parallel subenvironments available for a learner to learn. The way I used before, I had a PER sumTree(Binary search tree) with so many subenvironments producing trajectory at the same time, which led to bugs.

After modification, I let each subenvironment output trajectory correspond to one sumTree(Binary search tree). This solves the bug and reduces the size of each sumTree

Feb 06 '23 01:02 Yonv1943

This fix covers the following.

Agents folder.

In AgentXXX.py, in single env and vectorized env mode, make agent.last_state.shape == (num_envs, state_dim) to keep the shape of this tensor consistent.
In AgentXXX.py, update agent.get_obj_critic_per( ) for all algorithms to adapt it to the PER algorithm updates.
In net,.py, make logprob.shape == (batch_size, ) after summing over the action_dim dimension
In evaluator.py, update the import

train folder.

In replay_buffer.py, updated PER's sumTree, and the ReplayBuffer that calls sumTree.
In run.py, updated the multiprocess module to adapt to PER compatibility under WinOS.
In config.py, updated PER related parameters. And adjusted agent.last_state setting for agent.last_state.shape == (num_envs, state_dim)

env folder.

In CustomGymEnv, updated the installation code of the simulation environment for checking the PER algorithm.

example folder.

Adjusted the off-policy algorithm demo by moving the code for if_use_per=True from demo_DDPG_TD3_SAC.py to demo_PER_Prioritized Experience Replay.py

unit_test folder.

update the unit_test file corresponding. (for agent.last_state.shape == (num_envs, state_dim) )

Feb 06 '23 01:02 Yonv1943

Addition:

use states, actions, reawrds instead of state, action, reward as the name of tensor.
do not use space as the file name
rename get_returns to get_cumulative_rewards

Feb 06 '23 07:02 Yonv1943

ElegantRL ElegantRL copied to clipboard

🐛 Fix bug PER moudle.

ElegantRL
ElegantRL copied to clipboard