ElegantRL
ElegantRL copied to clipboard
ERL issues
-
run.py
- line 91: agent.state --> already set agent.states in init_agent. states is used instead of state.
- line 109: buffer.update_buffer --> already done in line 107
-
AgentBase.py
- line 116: while i < target_step or not done --> 'or' indidates always reach done, replace with 'and'?
- line 282: traj_list1 = [map(list, zip(*traj_list))] --> report error, change to 'traj_list1 = list(zip(*traj_list))'? --> any stable version?
- line 276: convert_trajectory function --> is it possible to get rid of for loop to make it more efficient?
-
replay_buffer.py
- line 129/130: traj_list --> report error, change to 'states, rewards, masks, actions = traj_list[0], traj_list[1], traj_list[2], traj_list[3]' ?
-
AgentREDQ.py
- line 187: update_net_? --> why update_net_? run.py (line 111) always call agent.update_net, thus will call its parent class AgentSAC.update_net instead
-
config.py
- line 35: self.n_step = 1 --> never used?
@Yiyang-Wu Thanks for the detailed feedback!
@Yonv1943 line 116 of AgentBase.py: while i < target_step or not done for "or" condition, it checks the first part "i < target_step", it is true when i < target_step; when i = target_step, it will check the second part "not done". This or-condition will be false when "i >= target_step" and the task is "done".
- If the task is done before target_step, the codes may have trouble, e.g., continue running even when it ends.
- If the task is not done before target_step, the codes will run until it reaches done.
I guess your intended logic would be: "i < target_step and not done", meaning that the while-loop ends in two cases: 1). if the task ends before target_step; or 2). we want to end it at target_step, when the task is still not done.
Thanks. I will based on the PR of @shixun404 and upload a Pull Request to fix these bug, which follows the stable version of ElegantRL helloworld. (Wednesday)
The stable verison:
- line 91
agent.statesandagent.state--> will be set asagent.states.statesis a compatible version of single and vectorized env. The shape of tensorstatesis(vectorized_env_number, state_dimension). - line 109, 116, 282, 276, 129 --> In the stable version, the function
explore_envwill be more simple, because we don't need to collect a complete trajectory like before. There are nottraj_list, convert_trajectory, ...in this function. - line 187:
update_net_*should beupdate_net. I will upload some code for unit testing. This should help the functions of the new algorithm conform to naming conventions.
- line 91
agent.statesandagent.state--> will be set asagent.states.statesis a compatible version of single and vectorized env. The shape of tensorstatesis(vectorized_env_number, state_dimension).
The variable agent.states is not clear enough. From my perspective, states should save in ReplayBuffer. However, it seems states are saved in the AgentBase.