PPOxFamily
PPOxFamily copied to clipboard
Chapter2 Application Demo
在本 issue 中,我们会更新所有和课程第二讲相关的应用 demo 素材
-
火箭回收(离散动作空间)
https://user-images.githubusercontent.com/33195032/209157737-d265161f-98f3-45fe-8e7d-8551183c2cce.mp4
-
无人机姿态控制(连续动作空间)
https://user-images.githubusercontent.com/33195032/209158056-ba51538f-85f2-4241-897c-a609ff160186.mp4
-
交通信控(多维离散动作空间)
https://user-images.githubusercontent.com/33195032/209158470-8c085382-2917-4248-9801-d0389ac1228b.mp4
-
导航控制(混合动作空间:参数化动作空间)
https://user-images.githubusercontent.com/33195032/209156336-ffd5cf4d-1c7c-4ef1-930f-f2e1249948c7.mp4
期待代码
请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。 谢谢!
请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。 谢谢!
其实就是 DI-engine 中的 MultiHead 功能实现,可以先看这边的源码,我们本周内会在课程 repo 这边更新下代码注解文档。
明白了,谢谢!
您好, 请问您回复说的更新关于multihead的代码注释是在哪可以看到?我最近在尝试用PPO实现输出多维动作。 一直没有弄清楚。谢谢!
我跟据讲解尝试了下multihead,但是报错了:
import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
super(DiscretePolicyNet, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(obs_dim, hidden_dim),
nn.ReLU(),
)
self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])
def forward(self, x: torch.Tensor)->torch.Tensor:
x = self.encoder(x)
logit = [self.head(x) for head in self.heads]
return logits
def sample_act(logit: torch.Tensor) -> torch.Tensor:
probs = torch.softmax(logit, dim=-1)
dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
return [dist.sample() for dist in dists]
def test_action_multihead():
B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
state = torch.rand(B, obs_shape)
policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
logit = policy_net(state)
assert logit.shape == (B, action_shape)
action = sample_act(logit)
assert action.shape == (B,)
return action
test_action_multihead()
TypeError Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()
/tmp/ipykernel_27/2493506364.py in test_action_multihead()
2 B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
3 state = torch.rand(B, obs_shape)
----> 4 policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
5 logit = policy_net(state)
6 assert logit.shape == (B, action_shape)
/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
6 nn.ReLU(),
7 )
----> 8 self.head = nn.Linear(hidden_dim, action_dim)
9
10 def forward(self, x: torch.Tensor)->torch.Tensor:
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
94 self.in_features = in_features
95 self.out_features = out_features
---> 96 self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
97 if bias:
98 self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
* (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
* (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
我跟据讲解尝试了下multihead,但是报错了:
import torch import torch.nn as nn import torch.nn.functional as F class DiscretePolicyNetMultiHead(nn.Module): def __init__(self, obs_dim, hidden_dim, action_dim) -> None: super(DiscretePolicyNet, self).__init__() self.encoder = nn.Sequential( nn.Linear(obs_dim, hidden_dim), nn.ReLU(), ) self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim]) def forward(self, x: torch.Tensor)->torch.Tensor: x = self.encoder(x) logit = [self.head(x) for head in self.heads] return logits def sample_act(logit: torch.Tensor) -> torch.Tensor: probs = torch.softmax(logit, dim=-1) dists = [torch.distributions.Categorical(probs=prob) for prob in probs] return [dist.sample() for dist in dists] def test_action_multihead(): B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3] state = torch.rand(B, obs_shape) policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape) logit = policy_net(state) assert logit.shape == (B, action_shape) action = sample_act(logit) assert action.shape == (B,) return action test_action_multihead()
TypeError Traceback (most recent call last) /tmp/ipykernel_27/530012604.py in <module> ----> 1 test_action_multihead() /tmp/ipykernel_27/2493506364.py in test_action_multihead() 2 B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3] 3 state = torch.rand(B, obs_shape) ----> 4 policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape) 5 logit = policy_net(state) 6 assert logit.shape == (B, action_shape) /tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim) 6 nn.ReLU(), 7 ) ----> 8 self.head = nn.Linear(hidden_dim, action_dim) 9 10 def forward(self, x: torch.Tensor)->torch.Tensor: /opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype) 94 self.in_features = in_features 95 self.out_features = out_features ---> 96 self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) 97 if bias: 98 self.bias = Parameter(torch.empty(out_features, **factory_kwargs)) TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of: * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
现在可以参考这个例子 https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py#L58
谢谢!我根据您的例子重写下。
multiDiscrete动作空间和Discrete动作空间相关的ppo的代码,还有控制交通信号灯的完整代码能分享一下吗?
你好,我docker pull了最新的opendilab/ding:nightly-mujoco
镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
,想跑一下drones的例子,但是报错
root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
Installing build dependencies ... error
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
WARNING: Did not find branch or tag 'master', assuming revision or ref.
Running command git checkout -q master
error: pathspec 'master' did not match any file(s) known to git.
error: subprocess-exited-with-error
× git checkout -q master did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× git checkout -q master did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
我手动安装了poetry-core
也不行,感觉是那个master的branch名字要改成main?
@PaParaZz1 请问有什么建议吗?
你好,我docker pull了最新的
opendilab/ding:nightly-mujoco
镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
,想跑一下drones的例子,但是报错root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4 Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4 Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1 Installing build dependencies ... error error: subprocess-exited-with-error × pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [20 lines of output] Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8 Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8 WARNING: Did not find branch or tag 'master', assuming revision or ref. Running command git checkout -q master error: pathspec 'master' did not match any file(s) known to git. error: subprocess-exited-with-error × git checkout -q master did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × git checkout -q master did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip.
我手动安装了
poetry-core
也不行,感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗?
解决了,需要把整个drones的库clone下来,git clone https://github.com/zjowowen/gym-pybullet-drones.git
然后把这行代码requires = ["poetry-core @ git+https://github.com/python-poetry/poetry-core.git@master"]里面的master
改成main
,然后在那个库里手动pip install -e .
就可以安装了
Hi,
This repo [https://github.com/zjowowen/gym-pybullet-drones.git] is updated with the origin repo [https://github.com/utiasDSL/gym-pybullet-drones].
Thanks for reminding us!
@zjowowen
跑通代码后我还是无法复现这个drones_fly_demo, 按照默认参数训练了5e6
steps之后return并没有很好看,然后我加载了最佳的保存模型,record了video之后发现它是从门上面飞过去的而不是从下面传过去的。请问为了达到你们展示的demo的效果还有别的设置吗?
您好,我在跑demo时老遇到这样的问题,不知道有没有小伙伴和我有一样的问题。
Traceback (most recent call last):
File "
[10-20 22:34:24] WARNING subprocess reset set seed failed, ignore and continue... subprocess_env_manager.py:263
subprocess exception traceback:
Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。
Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\envs\env_manager\subprocess_env_manager.py", line
259, in reset
ret = self._pipe_parents.recv()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing. [10-20 22:34:26] ERROR Env 2 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 1 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 3 reset has exceeded max retries(5) subprocess_env_manager.py:317 wandb: View run dutiful-pond-1 at: https://wandb.ai/anony-mouse-788424711663011732/bipedalwalker_demo/runs/uomu1uw0?apiKey=dc8282c6be97b578e2fa87aac8b882089ab2adaf wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: .\wandb\run-20231020_223406-uomu1uw0\logs
请问仓库中有multidiscretePPO的完整代码和训练过程吗
求multidiscrete+PPO 控制交通灯代码
请问仓库中有multidiscretePPO的完整代码和训练过程吗
可以参考 DI-smartcross 中的相关例子,由于 cityflow 环境比较复杂,我们没有直接整合到课程仓库中,所以请移步 DI-smartcross 查看。传送门
你好,无人机姿态控制(连续动作空间)这个案例的环境代码有么,想参考一下如何用强化学习在接上pid控制器的