ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

add CUDA accelerated Env

Open norci opened this issue 5 years ago • 5 comments

Nvidia has ported Atari to CUDA: https://github.com/NVlabs/cule The biggest benefit is, the data are in GPU memory, thus avoided memory copy between host & device.

My ideas are:

add cule as a 3rd party env

then update the algorithms in ReinforcementLearningZoo.jl, in order to use CuArray as actions, states, trajectory buffers...

implement a env wrapper

it should be a parallel GPU kernel, which setup the launch options, and execute the built in env.

future work

Numba has many limitations in kernel code. https://numba.pydata.org/numba-doc/dev/cuda/overview.html Use C++ to implement CUDA kernel is more difficult than Numba. But Julia has great support for CUDA programming, so this is the best choice for RL.

If this experiment is successful, then we can port some 3rd party env into Julia code, or the users can implement custom env, and train agents with this framework.

norci avatar Sep 25 '20 08:09 norci

I tried to implement a cuda wrapper like this:

using CUDA

struct CudaEnv{E} <: AbstractEnv
    envs::CuArray{E}
end

function launch_actions(env, actions)
    index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride = blockDim().x * gridDim().x
    for i in index:stride:length(env)
        @inbounds env[i](actions[i])
    end
    return nothing
end

function (env::CudaEnv)(actions)
    numblocks = ceil(Int, length(env) / 256)
    CUDA.@sync begin
        @cuda threads = 256 blocks = numblocks launch_actions(env.envs, actions)
    end
end

But It seems CuArray does not support use struct as it's element type.

julia> adapt(CuArray, [CartPoleEnv() for i in 1:2])
ERROR: CuArray only supports bits types
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] CuArray{CartPoleEnv{Float64,Random._GLOBAL_RNG},1}(::UndefInitializer, ::Tuple{Int64}) at /root/.julia/packages/CUDA/dZvbp/src/array.jl:115
julia> using StructArrays
julia> replace_storage(CuArray, StructArray(CartPoleEnv() for i in 1:2))
ERROR: CuArray only supports bits types
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] CuArray{MultiContinuousSpace{Array{Float64,1}},1}(::UndefInitializer, ::Tuple{Int64}) at /root/.julia/packages/CUDA/dZvbp/src/array.jl:115

I think it's impossible to implement a generic cuda wrapper in this framework, if CUDA.jl does not support arbitary data type.

We have to implement the env for cuda explicitly. e.g.

  • the state of the env should be put in a struct, in order to use a StructArray.
  • the parameters of the env should be put into GPU's global memory, in order to use them in a cuda kernel.

norci avatar Oct 23 '20 16:10 norci

what about update the env? that the state and actions are matrix, so as to use BLAS to accelerate it. This might be faster than a MultiThreadEnv, for the create of threads cost some time.

e.g.

mutable struct CartPoleEnv{
T,
A<:Union{Array,CuArray},
R<:AbstractRNG} <: AbstractEnv
    params::CartPoleEnvParams{T}
    action_space::DiscreteSpace{UnitRange{Int64}}
    observation_space::MultiContinuousSpace{Vector{T}}
    state::A{T,2}
    action::A{Int,1}
    done::A{Bool,1}
    t::A{Int,1}
    rng::A{R,1}
end

norci avatar Oct 23 '20 16:10 norci

Some basic ideas of CUDA.jl are needed here. Just like the error message said, the customized struct must be of isbits so that we can send it to the GPU.

To implement the CartPoleEnv on GPU, we may need to split the internal state into immutable structs and send them to GPU. Then after applying actions, update the structs in-place (not to update the inner fields of the struct anymore).

This might be faster than a MultiThreadEnv, for the create of threads cost some time.

Yes, it's totally possible. But note that MultiThreadEnv is a general wrapper for arbitrary envs.

findmyway avatar Oct 23 '20 17:10 findmyway

It seems cule ported the whole atari to gpu. see

  • https://github.com/NVlabs/cule/blob/master/cule/atari/wrapper.hpp#L140
  • https://github.com/NVlabs/cule/blob/master/cule/atari/cuda/kernels.hpp#L220
  • https://github.com/NVlabs/cule/blob/master/cule/atari/m6502.hpp

So the env can be run in batches.

I think the internal state should use Array, for another Env might have a lot of state dimensions. so we still need a 2d Array to store a batch of states, and this needs modification in the Env.

If there's a method to store a CuArray pointer in the internal state, and the pointer can be maintained by Julia, then it will be much easier.

norci avatar Oct 26 '20 09:10 norci

I think the internal state should use Array, for another Env might have a lot of state dimensions. so we still need a 2d Array to store a batch of states, and this needs modification in the Env.

That depends. I'm no expert in CUDA (I've never touched it after graduation 😢 ). But I think in-place modification is not always required here.

findmyway avatar Oct 26 '20 12:10 findmyway