xuance Questions about the documentation related to the RNN

使用多智能体MAPPO算法时，想要尝试用Basic_RNN替换Basic_MLP，在配置文件中同步修改use_rnn: True后，出现错误提示：

Traceback (most recent call last):
File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 22, in <module>
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 287, in train
self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 384, in run_episodes
policy_out = self.action(obs_dict=obs_dict, state=state, avail_actions_dict=avail_actions,
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 141, in action
rnn_hidden_critic_new, values_out = self.policy.get_values(observation=critic_input,
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/policies/gaussian_marl.py", line 176, in get_values
outputs = self.critic_representation[key](observation[key], *rnn_hidden[key])
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/rnn.py", line 63, in forward
output, hn = self.rnn(mlp_output, h)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 1117, in forward
raise RuntimeError(
RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor

应该是数据维度的问题，查阅文档后并未发现有相关部分的说明，不知道还需修改环境代码的其他什么位置，以下是我动作、状态、观察空间：

        self.state_space = Box(-np.inf, np.inf, shape=[7 * self.num_agents, ], dtype=np.float32)
        self.observation_space = {agent: Box(-np.inf, np.inf, shape=[14, ], dtype=np.float32) for agent in self.agents}
        self.action_space = {agent: Box(-1, 1, shape=[2, ], dtype=np.float32) for agent in self.agents}

请问还有哪里需要做出调整，谢谢答疑！

Mar 28 '25 08:03 HawkQ

你好，如果要修改representation为RNN，需要配置以下信息：

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

Mar 29 '25 02:03 wenzhangliu

你好，如果要修改representation为RNN，需要配置以下信息：

use_rnn: True rnn: "GRU" recurrent_layer_N: 1 fc_hidden_sizes: [64, ] recurrent_hidden_size: 64 N_recurrent_layers: 1 dropout: 0

您好，按此方法修改后，报错如下

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 27, in <module>
    Agent = MAPPO_Agents(config=configs, envs=envs)  # Create a DDPG agent from XuanCe.
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 24, in __init__
    super(MAPPO_Agents, self).__init__(config, envs)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/ippo_agents.py", line 24, in __init__
    self.policy = self._build_policy()  # build policy
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 38, in _build_policy
    A_representation = self._build_representation(self.config.representation, self.observation_space, self.config)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/base/agents_marl.py", line 217, in _build_representation
    representation[key] = REGISTRY_Representation[representation_key](**input_representations)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/mlp.py", line 40, in __init__
    self.output_shapes = {'state': (hidden_sizes[-1],)}
KeyError: -1

是否因为我state数据格式需要修改？

Mar 29 '25 02:03 HawkQ

可参考这上面的参数配置，对比格式是否一致：https://github.com/agi-brain/xuance/blob/master/examples/mappo/mappo_mpe_configs/simple_spread_v3.yaml.

Mar 29 '25 02:03 wenzhangliu

我的配置文件就是从simple_spread_v3.yaml修改而来，使用这个yaml也会报错： RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor 事实上直接使用simple_spread_v3.yaml运行mpe环境的测试，当配置文件修改为RNN相关设置后也会报错

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/testrun.py", line 13, in <module>
    runner.run()
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/runners/runner_marl.py", line 32, in run
    self.agents.train(n_train_steps)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 287, in train
    self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 420, in run_episodes
    _, value_next = self.values_next(i_env=i, obs_dict=obs_dict[i], state=state[i],
TypeError: 'NoneType' object is not subscriptable

Mar 29 '25 07:03 HawkQ

你好，请问在VDN或MADDPG这类算法上测试过吗？是否也存在同样问题？我需要判断据此判断一下问题出现在哪个环节

Apr 01 '25 05:04 wenzhangliu

你好，请问在VDN或MADDPG这类算法上测试过吗？是否也存在同样问题？我需要判断据此判断一下问题出现在哪个环节

您好，因readthedocs提供的配置文件有限，仅将MADDPG算法配置文件修改并添加如下内容：

agent: "MADDPG"  # the learning algorithms_marl
env_name: "fb"
env_id: "fb_v0"
env_seed: 1
continuous_action: True
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_RNN"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL"
distributed_training: False

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

representation_hidden_size: []  # the units for each hidden layer
actor_hidden_size: [64, ]
critic_hidden_size: [64, ]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False

MADDPG可以运行，但仅单环境运行也会大量读写，不知是否是RNN的特性

在VDN中，配置文件关键如下：

agent: "VDN"  
env_name: "fb"
env_id: "fb_v0"
env_seed: 1
continuous_action: True
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL"
distributed_training: False

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

representation_hidden_size: [64, ]
q_hidden_size: [64, ]  # the units for each hidden layer
activation: "relu"

此时会报错：

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 29, in <module>
    Agent = VDN_Agents(config=configs, envs=envs)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/vdn_agents.py", line 27, in __init__
    self.policy = self._build_policy()  # build policy
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/vdn_agents.py", line 44, in _build_policy
    representation = self._build_representation(self.config.representation, self.observation_space, self.config)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/base/agents_marl.py", line 217, in _build_representation
    representation[key] = REGISTRY_Representation[representation_key](**input_representations)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/mlp.py", line 40, in __init__
    self.output_shapes = {'state': (hidden_sizes[-1],)}
KeyError: -1

这与上文报错相同

Apr 03 '25 13:04 HawkQ

你好，请确认representation参数设置为“Basic_RNN”：

representation: "Basic_RNN"

Apr 04 '25 03:04 wenzhangliu

VDN修改为representation: "Basic_RNN"后，由于动作连续，因此报错，但简单调整为离散动作后能够运行

Apr 04 '25 03:04 HawkQ

VDN适用于离散动作

Apr 04 '25 03:04 wenzhangliu

你好，请问在VDN或MADDPG这类算法上测试过吗？是否也存在同样问题？我需要判断据此判断一下问题出现在哪个环节

经过测试，VDN能够运行，MADDPG能够运行，MAPPO报错：

Traceback (most recent call last):
  File "C:\Users\HawkQ\Desktop\frigatebird_multi\new_run.py", line 30, in <module>
    Agent.train(configs.running_steps // configs.parallels)  # Train the model for numerous steps.
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\core\on_policy_marl.py", line 287, in train
    self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\core\on_policy_marl.py", line 384, in run_episodes
    policy_out = self.action(obs_dict=obs_dict, state=state, avail_actions_dict=avail_actions,
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\multi_agent_rl\mappo_agents.py", line 141, in action
    rnn_hidden_critic_new, values_out = self.policy.get_values(observation=critic_input,
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\policies\gaussian_marl.py", line 176, in get_values
    outputs = self.critic_representation[key](observation[key], *rnn_hidden[key])
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\representations\rnn.py", line 63, in forward
    output, hn = self.rnn(mlp_output, h)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\rnn.py", line 1117, in forward
    raise RuntimeError(
RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor

Apr 27 '25 16:04 HawkQ

你好，请问你的问题解决了吗？

May 13 '25 03:05 wenzhangliu

你好，请问你的问题解决了吗？

抱歉还没有，目前只能暂时不使用RNN进行训练

May 20 '25 06:05 HawkQ