ASE Minor bug: agent is using cuda:0 device no matter what rl

Minor bug: agent is using cuda:0 device no matter what rl_device arg is

Open gunnxx opened this issue 2 years ago • 2 comments

Problem

ase.learning.common_agent.CommonAgent inherits rl_games.common.a2c_common.A2CBase which stores all tensors to self.ppo_device.
self.ppo_device is set by getting device key from config. If there is no device key, it is set to cuda:0 by default. (see here)
tracing back to run.py file, config is supplied by cfg_train["params"]["config"]. You can print cfg_train["params"]["config"].keys() and there is no device.

How to check

To check this issue, simply run the original pretraining command with --rl_device argument is set to another cuda device such as cuda:1 and it still consumes cuda:0 memory.

python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless --rl_device cuda:1

How to fix

To fix this, simply add cfg_train["params"]["config"]["device"] = args.rl_device in function load_cfg().

Oct 25 '22 11:10 gunnxx

ASE ASE copied to clipboard

Minor bug: agent is using cuda:0 device no matter what rl_device arg is

Problem

How to check

How to fix

ASE
ASE copied to clipboard