ElegantRL icon indicating copy to clipboard operation
ElegantRL copied to clipboard

ERL issues

Open Yiyang-Wu opened this issue 3 years ago • 3 comments

  • run.py

    • line 91: agent.state --> already set agent.states in init_agent. states is used instead of state.
    • line 109: buffer.update_buffer --> already done in line 107
  • AgentBase.py

    • line 116: while i < target_step or not done --> 'or' indidates always reach done, replace with 'and'?
    • line 282: traj_list1 = [map(list, zip(*traj_list))] --> report error, change to 'traj_list1 = list(zip(*traj_list))'? --> any stable version?
    • line 276: convert_trajectory function --> is it possible to get rid of for loop to make it more efficient?
  • replay_buffer.py

    • line 129/130: traj_list --> report error, change to 'states, rewards, masks, actions = traj_list[0], traj_list[1], traj_list[2], traj_list[3]' ?
  • AgentREDQ.py

    • line 187: update_net_? --> why update_net_? run.py (line 111) always call agent.update_net, thus will call its parent class AgentSAC.update_net instead
  • config.py

    • line 35: self.n_step = 1 --> never used?

Yiyang-Wu avatar Sep 07 '22 13:09 Yiyang-Wu

@Yiyang-Wu Thanks for the detailed feedback!

@Yonv1943 line 116 of AgentBase.py: while i < target_step or not done for "or" condition, it checks the first part "i < target_step", it is true when i < target_step; when i = target_step, it will check the second part "not done". This or-condition will be false when "i >= target_step" and the task is "done".

  1. If the task is done before target_step, the codes may have trouble, e.g., continue running even when it ends.
  2. If the task is not done before target_step, the codes will run until it reaches done.

I guess your intended logic would be: "i < target_step and not done", meaning that the while-loop ends in two cases: 1). if the task ends before target_step; or 2). we want to end it at target_step, when the task is still not done.

YangletLiu avatar Sep 09 '22 17:09 YangletLiu

Thanks. I will based on the PR of @shixun404 and upload a Pull Request to fix these bug, which follows the stable version of ElegantRL helloworld. (Wednesday)

the PR of shixun404

The stable verison:

  1. line 91 agent.states and agent.state --> will be set as agent.states. states is a compatible version of single and vectorized env. The shape of tensor states is (vectorized_env_number, state_dimension).
  2. line 109, 116, 282, 276, 129 --> In the stable version, the function explore_env will be more simple, because we don't need to collect a complete trajectory like before. There are not traj_list, convert_trajectory, ... in this function.
  3. line 187: update_net_* should be update_net. I will upload some code for unit testing. This should help the functions of the new algorithm conform to naming conventions.

Yonv1943 avatar Sep 13 '22 03:09 Yonv1943

  1. line 91 agent.states and agent.state --> will be set as agent.states. states is a compatible version of single and vectorized env. The shape of tensor states is (vectorized_env_number, state_dimension).

The variable agent.states is not clear enough. From my perspective, states should save in ReplayBuffer. However, it seems states are saved in the AgentBase.

shixun404 avatar Sep 13 '22 09:09 shixun404