on-policy
on-policy copied to clipboard
Error when run ./train_mpe_spread.sh
When I tried to run ./train_mpe_spread.sh, I met the following issue:
obs_space: [Box(18,), Box(18,), Box(18,)]
share_obs_space: [Box(54,), Box(54,), Box(54,)]
act_space: [Discrete(5), Discrete(5), Discrete(5)]
Traceback (most recent call last):
File "../train/train_mpe.py", line 174, in <module>
main(sys.argv[1:])
File "../train/train_mpe.py", line 159, in main
runner.run()
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/mpe_runner.py", line 28, in run
values, actions, action_log_probs, rnn_states, rnn_states_critic, actions_env = self.collect(step)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/mpe_runner.py", line 103, in collect
np.concatenate(self.buffer.masks[step]))
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/rMAPPOPolicy.py", line 71, in get_actions deterministic)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/r_actor_critic.py", line 64, in forward
actor_features = self.base(obs)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/mlp.py", line 56, in forward
x = self.mlp(x)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/mlp.py", line 27, in forward
x = self.fc1(x)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Try running this and see if you still get the error.
import torch
print("Is CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())
a = torch.randn(1024, 1024, device="cuda:0")
b = torch.randn(1024, 1024, device="cuda:0")
c = torch.matmul(a, b) # Matrix multiplication
print("Matrix multiplication result shape:", c.shape)
If so, you need to fix your PyTorch/CUDA installation. Try
conda install pytorch -c pytorch
Fixed!try the new code!