ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
add CUDA accelerated Env
Nvidia has ported Atari to CUDA: https://github.com/NVlabs/cule The biggest benefit is, the data are in GPU memory, thus avoided memory copy between host & device.
My ideas are:
add cule as a 3rd party env
then update the algorithms in ReinforcementLearningZoo.jl, in order to use CuArray as actions, states, trajectory buffers...
implement a env wrapper
it should be a parallel GPU kernel, which setup the launch options, and execute the built in env.
future work
Numba has many limitations in kernel code. https://numba.pydata.org/numba-doc/dev/cuda/overview.html Use C++ to implement CUDA kernel is more difficult than Numba. But Julia has great support for CUDA programming, so this is the best choice for RL.
If this experiment is successful, then we can port some 3rd party env into Julia code, or the users can implement custom env, and train agents with this framework.
I tried to implement a cuda wrapper like this:
using CUDA
struct CudaEnv{E} <: AbstractEnv
envs::CuArray{E}
end
function launch_actions(env, actions)
index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
for i in index:stride:length(env)
@inbounds env[i](actions[i])
end
return nothing
end
function (env::CudaEnv)(actions)
numblocks = ceil(Int, length(env) / 256)
CUDA.@sync begin
@cuda threads = 256 blocks = numblocks launch_actions(env.envs, actions)
end
end
But It seems CuArray does not support use struct as it's element type.
julia> adapt(CuArray, [CartPoleEnv() for i in 1:2])
ERROR: CuArray only supports bits types
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] CuArray{CartPoleEnv{Float64,Random._GLOBAL_RNG},1}(::UndefInitializer, ::Tuple{Int64}) at /root/.julia/packages/CUDA/dZvbp/src/array.jl:115
julia> using StructArrays
julia> replace_storage(CuArray, StructArray(CartPoleEnv() for i in 1:2))
ERROR: CuArray only supports bits types
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] CuArray{MultiContinuousSpace{Array{Float64,1}},1}(::UndefInitializer, ::Tuple{Int64}) at /root/.julia/packages/CUDA/dZvbp/src/array.jl:115
I think it's impossible to implement a generic cuda wrapper in this framework, if CUDA.jl does not support arbitary data type.
We have to implement the env for cuda explicitly. e.g.
- the state of the env should be put in a
struct, in order to use a StructArray. - the parameters of the env should be put into GPU's global memory, in order to use them in a cuda kernel.
what about update the env? that the state and actions are matrix, so as to use BLAS to accelerate it.
This might be faster than a MultiThreadEnv, for the create of threads cost some time.
e.g.
mutable struct CartPoleEnv{
T,
A<:Union{Array,CuArray},
R<:AbstractRNG} <: AbstractEnv
params::CartPoleEnvParams{T}
action_space::DiscreteSpace{UnitRange{Int64}}
observation_space::MultiContinuousSpace{Vector{T}}
state::A{T,2}
action::A{Int,1}
done::A{Bool,1}
t::A{Int,1}
rng::A{R,1}
end
Some basic ideas of CUDA.jl are needed here. Just like the error message said, the customized struct must be of isbits so that we can send it to the GPU.
To implement the CartPoleEnv on GPU, we may need to split the internal state into immutable structs and send them to GPU. Then after applying actions, update the structs in-place (not to update the inner fields of the struct anymore).
This might be faster than a MultiThreadEnv, for the create of threads cost some time.
Yes, it's totally possible. But note that MultiThreadEnv is a general wrapper for arbitrary envs.
It seems cule ported the whole atari to gpu. see
- https://github.com/NVlabs/cule/blob/master/cule/atari/wrapper.hpp#L140
- https://github.com/NVlabs/cule/blob/master/cule/atari/cuda/kernels.hpp#L220
- https://github.com/NVlabs/cule/blob/master/cule/atari/m6502.hpp
So the env can be run in batches.
I think the internal state should use Array, for another Env might have a lot of state dimensions. so we still need a 2d Array to store a batch of states, and this needs modification in the Env.
If there's a method to store a CuArray pointer in the internal state, and the pointer can be maintained by Julia, then it will be much easier.
I think the internal state should use Array, for another Env might have a lot of state dimensions. so we still need a 2d Array to store a batch of states, and this needs modification in the Env.
That depends. I'm no expert in CUDA (I've never touched it after graduation 😢 ). But I think in-place modification is not always required here.