imitation
imitation copied to clipboard
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Bug description
Traceback (most recent call last):
File "/home/kavin/Documents/PycharmProjects/RL/Imitation/example.py", line 150, in <module>
bc_trainer.train(n_epochs=1)
File "/home/kavin/anaconda3/envs/PythonEnv/lib/python3.8/site-packages/imitation/algorithms/bc.py", line 495, in train
training_metrics = self.loss_calculator(self.policy, obs_tensor, acts)
File "/home/kavin/anaconda3/envs/PythonEnv/lib/python3.8/site-packages/imitation/algorithms/bc.py", line 130, in __call__
(_, log_prob, entropy) = policy.evaluate_actions(
File "/home/kavin/anaconda3/envs/PythonEnv/lib/python3.8/site-packages/stable_baselines3/common/policies.py", line 736, in evaluate_actions
log_prob = distribution.log_prob(actions)
File "/home/kavin/anaconda3/envs/PythonEnv/lib/python3.8/site-packages/stable_baselines3/common/distributions.py", line 292, in log_prob
return self.distribution.log_prob(actions)
File "/home/kavin/anaconda3/envs/PythonEnv/lib/python3.8/site-packages/torch/distributions/categorical.py", line 127, in log_prob
return log_pmf.gather(-1, value).squeeze(-1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
It seems that it cannot train with a GPU. I got this error when I set the device to "cuda:0", but not when I set it to "cpu".
Steps to reproduce
print("Loading expert demonstrations...")
rng = np.random.default_rng(0)
env = gym.make("CartPole-v1")
venv = make_vec_env("CartPole-v1", post_wrappers=[lambda env, _: RolloutInfoWrapper(env)], rng=rng)
expertAgent = PPO.load("ppo_cartpole.zip", print_system_info=False)
print("Rollouts...")
rollouts = rollout.rollout(
expertAgent,
venv,
rollout.make_sample_until(min_timesteps=None, min_episodes=60),
rng=rng,
)
bc_trainer = bc.BC(
observation_space=venv.observation_space,
action_space=venv.action_space,
demonstrations=rollouts,
rng=rng,
device="cuda:0",
)
reward, _ = evaluate_policy(
bc_trainer.policy,
venv,
n_eval_episodes=3,
render=False,
)
print(f"Reward before training: {reward}")
print("Training a policy using Behavior Cloning")
bc_trainer.train(n_epochs=1)
reward, _ = evaluate_policy(
bc_trainer.policy,
env,
n_eval_episodes=3,
render=False,
)
print(f"Reward after training: {reward}")
Environment
- Operating system and version: Ubuntu 20.04
- Python version: 3.8
- Output of
pip freeze --all
: stable-baselines3==2.2.0
I get this error too. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
It appears on call to train()
, at least on two algos. dagger_trainer.train() and bc_trainer.train()
imitation
version 1.0
This error occur because the function safe_to_tensor in imitation/src/imitation/util/util.py return a tensor without transferring to 'cuda'.
fix the safe_to_tensor function: replace line 259: return array by the code: return th.as_tensor(array, **kwargs)