PPOxFamily Chapter2 Application Demo

在本 issue 中，我们会更新所有和课程第二讲相关的应用 demo 素材

训练代码链接

火箭回收（离散动作空间）

https://user-images.githubusercontent.com/33195032/209157737-d265161f-98f3-45fe-8e7d-8551183c2cce.mp4
无人机姿态控制（连续动作空间）

https://user-images.githubusercontent.com/33195032/209158056-ba51538f-85f2-4241-897c-a609ff160186.mp4
交通信控（多维离散动作空间）

https://user-images.githubusercontent.com/33195032/209158470-8c085382-2917-4248-9801-d0389ac1228b.mp4
导航控制（混合动作空间：参数化动作空间）

https://user-images.githubusercontent.com/33195032/209156336-ffd5cf4d-1c7c-4ef1-930f-f2e1249948c7.mp4

Dec 22 '22 14:12 PaParaZz1

期待代码

Jan 17 '23 15:01 EasonQYS

请问有关于multiDiscrete动作空间的详细对照解析吗，我查看了代码注视文档教程好像只有普通离散动作的。谢谢！

Mar 12 '23 19:03 jianzuo

请问有关于multiDiscrete动作空间的详细对照解析吗，我查看了代码注视文档教程好像只有普通离散动作的。谢谢！

其实就是 DI-engine 中的 MultiHead 功能实现，可以先看这边的源码，我们本周内会在课程 repo 这边更新下代码注解文档。

Mar 15 '23 02:03 PaParaZz1

明白了，谢谢！

Mar 15 '23 07:03 jianzuo

您好，请问您回复说的更新关于multihead的代码注释是在哪可以看到？我最近在尝试用PPO实现输出多维动作。一直没有弄清楚。谢谢！

Mar 25 '23 09:03 jianzuo

我跟据讲解尝试了下multihead,但是报错了：

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])
        
        
    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits
    
def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

Mar 25 '23 11:03 jianzuo

我跟据讲解尝试了下multihead,但是报错了：

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])
        
        
    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits
    
def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

现在可以参考这个例子 https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py#L58

Mar 28 '23 08:03 PaParaZz1

谢谢！我根据您的例子重写下。

Mar 28 '23 13:03 jianzuo

multiDiscrete动作空间和Discrete动作空间相关的ppo的代码，还有控制交通信号灯的完整代码能分享一下吗?

Aug 01 '23 02:08 lz-8713

你好，我docker pull了最新的opendilab/ding:nightly-mujoco镜像，然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master，想跑一下drones的例子，但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error
      
        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error
      
      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行，感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗？

Aug 24 '23 08:08 zhixiongzh

你好，我docker pull了最新的opendilab/ding:nightly-mujoco镜像，然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master，想跑一下drones的例子，但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error
      
        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error
      
      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行，感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗？

解决了，需要把整个drones的库clone下来，git clone https://github.com/zjowowen/gym-pybullet-drones.git 然后把这行代码requires = ["poetry-core @ git+https://github.com/python-poetry/poetry-core.git@master"]里面的master改成main，然后在那个库里手动pip install -e .就可以安装了

Aug 25 '23 07:08 zhixiongzh

Hi,

This repo [https://github.com/zjowowen/gym-pybullet-drones.git] is updated with the origin repo [https://github.com/utiasDSL/gym-pybullet-drones].

Thanks for reminding us!

Aug 25 '23 10:08 zjowowen

@zjowowen 跑通代码后我还是无法复现这个drones_fly_demo, 按照默认参数训练了5e6 steps之后return并没有很好看，然后我加载了最佳的保存模型，record了video之后发现它是从门上面飞过去的而不是从下面传过去的。请问为了达到你们展示的demo的效果还有别的设置吗？ return

Aug 29 '23 08:08 zhixiongzh

您好，我在跑demo时老遇到这样的问题，不知道有没有小伙伴和我有一样的问题。 Traceback (most recent call last): File "", line 1, in File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\utils\compression_helper.py", line 24, in setstate self.data = cloudpickle.loads(data) TypeError: _generator_ctor() takes from 0 to 1 positional arguments but 2 were given

[10-20 22:34:24] WARNING subprocess reset set seed failed, ignore and continue... subprocess_env_manager.py:263 subprocess exception traceback:
Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。

Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\envs\env_manager\subprocess_env_manager.py", line
259, in reset
ret = self._pipe_parents.recv()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing. [10-20 22:34:26] ERROR Env 2 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 1 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 3 reset has exceeded max retries(5) subprocess_env_manager.py:317 wandb: View run dutiful-pond-1 at: https://wandb.ai/anony-mouse-788424711663011732/bipedalwalker_demo/runs/uomu1uw0?apiKey=dc8282c6be97b578e2fa87aac8b882089ab2adaf wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: .\wandb\run-20231020_223406-uomu1uw0\logs