ElegantRL self.states[0]

@Yonv1943 self.states[0] is very difficult to understand. Please give a better method. https://github.com/AI4Finance-Foundation/ElegantRL/blob/98b83959cc6e62f17e7c2ad104c5050e84bd7297/helloworld/helloworld_PPO_single_file.py#L194

Sep 15 '22 01:09 shixun404

In stable baselines 3, they set self._last_obs in their training pipeline class OffPolicyAlgorithm or class OnPolicyAlgorithm stable baselines 3 在他们的代码中，也有 self._last_obs 的设置，如下：

        self._last_obs = new_obs
        # Save the unnormalized observation
        if self._vec_normalize_env is not None:
            self._last_original_obs = new_obs_

https://github.com/DLR-RM/stable-baselines3/blob/18b29a68e8d5a11d0e98aeea539c247f0a913019/stable_baselines3/common/off_policy_algorithm.py#L517-L520

The training pipeline needs to build a variance in __init__. This variance helps to save the previous observation state for training. 具有训练流程功能的这个类，它需要在__init__里面新建一个变量，用来记录上一条trajectory 未探索完成的变量 last_obs

The training pipeline class can continue to explore from the position in the previous trajectory, that has not ended. Avoid using reset every time, so that the agent can only explore the first part of the trajectory. 具有训练流程功能的这个类，在有了lasdt_obs 这个变量后，可以从从上一条trajectory中还没有结束的位置继续探索。避免“每一次都使用 reset，让智能体只能探索到trajectory的前半截”。

天授库对 last_obs 的处理会更加复杂一些：他们在class Collector这个类__init__的时候：

调用了 self.reset() https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L77
调用了self.reset_env https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L120
然后也是赋值了 self.data.obs https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L154

赋值的语句在： https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/data/collector.py#L136-L154

        if returns_info:
            obs, info = rval
            if self.preprocess_fn:
                processed_data = self.preprocess_fn(
                    obs=obs, info=info, env_id=np.arange(self.env_num)
                )
                obs = processed_data.get("obs", obs)
                info = processed_data.get("info", info)
            self.data.info = info
        else:
            obs = rval
            if self.preprocess_fn:
                obs = self.preprocess_fn(obs=obs, env_id=np.arange(self.env_num
                                                                   )).get("obs", obs)
        self.data.obs = obs

所以有： DRL library --> ElegantRL: AgentBase.py --> self.states StableBaselines3: OffPolicyAlgorithm or OnPolicyAlgorithm --> self.last_obs TianShou: BaseTrainer --> Collector --> self.reset() --> self.env_reset() --> self.data

Sep 15 '22 09:09 Yonv1943

In stable baselines 3, they set self._last_obs in their training pipeline class OffPolicyAlgorithm or class OnPolicyAlgorithm stable baselines 3 在他们的代码中，也有 self._last_obs 的设置，如下：

The variable name self._last_obs is much clearer than self.states[0].

The training pipeline needs to build a variance in __init__. This variance helps to save the previous observation state for training. 具有训练流程功能的这个类，它需要在__init__里面新建一个变量，用来记录上一条trajectory 未探索完成的变量 last_obs

last_obs is a part of the training process. It is neither reasonable nor clear to put this variable in class AgentBase.

The training pipeline class can continue to explore from the position in the previous trajectory, that has not ended. Avoid using reset every time, so that the agent can only explore the first part of the trajectory. 具有训练流程功能的这个类，在有了lasdt_obs 这个变量后，可以从从上一条trajectory中还没有结束的位置继续探索。避免“每一次都使用 reset，让智能体只能探索到trajectory的前半截”。

Continuing exploring from the last observation is a good functionality.

Sep 15 '22 09:09 shixun404

The follow Pull request fix this bug ↓ Fix bug for vec env and agentbase init https://github.com/AI4Finance-Foundation/ElegantRL/pull/248

Jan 09 '23 02:01 Yonv1943

ElegantRL ElegantRL copied to clipboard

self.states[0]

ElegantRL
ElegantRL copied to clipboard