cleanrl icon indicating copy to clipboard operation
cleanrl copied to clipboard

Improve compatibility with CUDA-enabled pytorch on non-CUDA devices

Open mainrs opened this issue 9 months ago • 3 comments

Problem Description

The pytorch CUDA-enabled libraries are more capable than the CPU-only one. They can also run on the CPU if no CUDA device is available. However, due to a race condition, the current code base calls into the CUDA driver even if one passes --no-cuda as an argument.

The issues are these lines of code: https://github.com/vwxyzjn/cleanrl/blob/8cbca61360ef98660f149e3d76762350ce613323/cleanrl/dqn.py#L147

They should first check if the flag is set and only then call torch.cuda.is_available. That way, the program runs perfectly fine in those scenarios.

Possible Solution

device = torch.device("cuda" if args.cuda and torch.cuda.is_available() else "cpu")

mainrs avatar May 07 '24 09:05 mainrs

Could you clarify why this is a "race condition"?

it shouldn't be cuda unless both cases are true so I don't understand why ordering would matter in this case

pseudo-rnd-thoughts avatar May 08 '24 21:05 pseudo-rnd-thoughts

Because Python's and is lazy. If the first option is false, than it doesn't evaluate the second one. If I pass --no-cuda, Python still runs torch.cuda.is_available, even though I specified that I do not want to use CUDA.

On devices that don't have CUDA drivers (or have the stub drivers) but have the CUDA version of PyTorch installed, this throws a runtime error. However, using the CPU on such devices is valid, since the PyTorch library can still function by using the CPU.

mainrs avatar May 09 '24 09:05 mainrs

Ok, that makes sense, I thought that pytorch would be smart enough to not raise a runtime error for this function. I would make a PR that makes your suggested change

pseudo-rnd-thoughts avatar May 09 '24 22:05 pseudo-rnd-thoughts