ElegantRL icon indicating copy to clipboard operation
ElegantRL copied to clipboard

self.states[0]

Open shixun404 opened this issue 2 years ago • 3 comments

@Yonv1943 self.states[0] is very difficult to understand. Please give a better method. https://github.com/AI4Finance-Foundation/ElegantRL/blob/98b83959cc6e62f17e7c2ad104c5050e84bd7297/helloworld/helloworld_PPO_single_file.py#L194

shixun404 avatar Sep 15 '22 01:09 shixun404

In stable baselines 3, they set self._last_obs in their training pipeline class OffPolicyAlgorithm or class OnPolicyAlgorithm stable baselines 3 在他们的代码中,也有 self._last_obs 的设置,如下:

        self._last_obs = new_obs
        # Save the unnormalized observation
        if self._vec_normalize_env is not None:
            self._last_original_obs = new_obs_

https://github.com/DLR-RM/stable-baselines3/blob/18b29a68e8d5a11d0e98aeea539c247f0a913019/stable_baselines3/common/off_policy_algorithm.py#L517-L520

The training pipeline needs to build a variance in __init__. This variance helps to save the previous observation state for training. 具有训练流程功能的这个类,它需要在__init__里面新建一个变量,用来记录 上一条trajectory 未探索完成的变量 last_obs

The training pipeline class can continue to explore from the position in the previous trajectory, that has not ended. Avoid using reset every time, so that the agent can only explore the first part of the trajectory. 具有训练流程功能的这个类,在有了lasdt_obs 这个变量后,可以从从上一条trajectory中还没有结束的位置继续探索。避免“每一次都使用 reset,让智能体只能探索到trajectory的前半截”。


天授库 对 last_obs 的处理会更加复杂一些: 他们在class Collector这个类__init__的时候:

  • 调用了 self.reset() https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L77
  • 调用了self.reset_env https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L120
  • 然后也是赋值了 self.data.obs https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L154

赋值的语句在: https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L136-L154

        if returns_info:
            obs, info = rval
            if self.preprocess_fn:
                processed_data = self.preprocess_fn(
                    obs=obs, info=info, env_id=np.arange(self.env_num)
                )
                obs = processed_data.get("obs", obs)
                info = processed_data.get("info", info)
            self.data.info = info
        else:
            obs = rval
            if self.preprocess_fn:
                obs = self.preprocess_fn(obs=obs, env_id=np.arange(self.env_num
                                                                   )).get("obs", obs)
        self.data.obs = obs

所以有: DRL library --> ElegantRL: AgentBase.py --> self.states StableBaselines3: OffPolicyAlgorithm or OnPolicyAlgorithm --> self.last_obs TianShou: BaseTrainer --> Collector --> self.reset() --> self.env_reset() --> self.data

Yonv1943 avatar Sep 15 '22 09:09 Yonv1943

In stable baselines 3, they set self._last_obs in their training pipeline class OffPolicyAlgorithm or class OnPolicyAlgorithm stable baselines 3 在他们的代码中,也有 self._last_obs 的设置,如下:

  • The variable name self._last_obs is much clearer than self.states[0].

The training pipeline needs to build a variance in __init__. This variance helps to save the previous observation state for training. 具有训练流程功能的这个类,它需要在__init__里面新建一个变量,用来记录 上一条trajectory 未探索完成的变量 last_obs

  • last_obs is a part of the training process. It is neither reasonable nor clear to put this variable in class AgentBase.

The training pipeline class can continue to explore from the position in the previous trajectory, that has not ended. Avoid using reset every time, so that the agent can only explore the first part of the trajectory. 具有训练流程功能的这个类,在有了lasdt_obs 这个变量后,可以从从上一条trajectory中还没有结束的位置继续探索。避免“每一次都使用 reset,让智能体只能探索到trajectory的前半截”。

  • Continuing exploring from the last observation is a good functionality.

shixun404 avatar Sep 15 '22 09:09 shixun404

The follow Pull request fix this bug ↓ Fix bug for vec env and agentbase init https://github.com/AI4Finance-Foundation/ElegantRL/pull/248

Yonv1943 avatar Jan 09 '23 02:01 Yonv1943