ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
Bounds Error at UPDATE_FREQ Step
Hello,
I've recreated a simple version of my project at the following link on my github: https://github.com/ryan-o-c/RL-Env-Debugger/blob/main/Debugger.ipynb
The broad strokes are that I want to pass an image and vector to a CNN/Dense NN mix in my agent, and I seem to have managed this by passing the state as a flat vector and reshaping it inside the agent's NN architecture. However, when I reach the update stage, the generalised advantage function throws a bounds error that I don't quite understand.
Please let me know if I can do anything else to more clearly outline the error - Any help is greatly appreciated!
(Small note on the code: the actions state and reward are greatly simplified for the sake of brevity, and so are essentially irrelevant (unless of course there is some reason they cause the error!). Also, the environment only "works" for n=5 - I simplified my code as much as possible while still ending at the same error in my more complex version of the environment)
I spend some time debugging it. It seems the policy part has some problem:
AC.critic(flatten_batch(states_plus))
1×1 Matrix{Float32}:
0.038754486
This should return a matrix of size (1, batch_size). So you may need to check that part.
The broad strokes are that I want to pass an image and vector to a CNN/Dense NN mix in my agent, and I seem to have managed this by passing the state as a flat vector and reshaping it inside the agent's NN architecture.
Yeah, that's a bit awkward to do so in the current version. I promise this surely will be addressed in the next version (because I'm also working on a problem with different types of states involved).
I spend some time debugging it. It seems the policy part has some problem:
AC.critic(flatten_batch(states_plus)) 1×1 Matrix{Float32}: 0.038754486This should return a matrix of size
(1, batch_size). So you may need to check that part.The broad strokes are that I want to pass an image and vector to a CNN/Dense NN mix in my agent, and I seem to have managed this by passing the state as a flat vector and reshaping it inside the agent's NN architecture.
Yeah, that's a bit awkward to do so in the current version. I promise this surely will be addressed in the next version (because I'm also working on a problem with different types of states involved).
Thank you so much for the fast response - you're super helpful! :)
I'm still not quite there on debugging the original code, so I've tried to implement a different, more simple environment instead. Unfortunately, I have run into other errors! In my new reproduced example (link below) it's just an A2C agent that passes the state vector into a simple dense NN. This time the error is coming from the epsilon greedy explorer function. To be more specific, it gets stuck at the following piece (I added a println for debugging!):
function (s::EpsilonGreedyExplorer{<:Any,false})(values, mask) ϵ = get_ϵ(s) s.is_training && (s.step += 1) println("values: ", values, " and mask: ", mask, " ") rand(s.rng) >= ϵ ? findmax(values, mask)[2] : rand(s.rng, findall(mask)) end
It gives the error on the line rand(s.rng) >= ϵ ? findmax(values, mask)[2] : rand(s.rng, findall(mask))
I will print the entire error code below but the first line reads: ERROR: MethodError: objects of type Vector{Float64} are not callable Use square brackets [] for indexing an Array.
I think that the values variable being fed to the explorer function is a policy sampled from the learner, and when I print the values and mask variables to the screen and manually run findmax(values, mask)[2] the findmax seems to run fine. I have used some print statements to determine that the code only fails when the "rand(s.rng) >= ϵ" clause returns true, so I'm puzzled as to where the error is coming from.
I've tried my best to get to the root of the problem but I can't seem to get any further - please let me know if there's anything more I can do/provide!
Link to code: https://github.com/ryan-o-c/RL-Env-Debugger/blob/main/Debugger.ipynb
(There is some redundant code at the very bottom - I left a note to say to ignore that part. The print statement I put in the explorer function makes for a chunky output in the run(...) cell so you can safely scroll past that too.
Finally, the full error code reads:
MethodError: objects of type Vector{Float32} are not callable Use square brackets [] for indexing an Array.
Stacktrace: [1] (::Base.var"#260#261"{Vector{Float32}})(::Pair{Int64, Any}) @ Base ./reduce.jl:803 [2] MappingRF @ ./reduce.jl:95 [inlined] [3] _foldl_impl @ ./reduce.jl:58 [inlined] [4] foldl_impl(op::Base.MappingRF{Base.var"#260#261"{Vector{Float32}}, Base.BottomRF{typeof(Base._rf_findmax)}}, nt::Base._InitialValue, itr::Base.Pairs{Int64, Any, LinearIndices{1, Tuple{Base.OneTo{Int64}}}, Vector{Any}}) @ Base ./reduce.jl:48 [5] mapfoldl_impl(f::Base.var"#260#261"{Vector{Float32}}, op::typeof(Base._rf_findmax), nt::Base._InitialValue, itr::Base.Pairs{Int64, Any, LinearIndices{1, Tuple{Base.OneTo{Int64}}}, Vector{Any}}) @ Base ./reduce.jl:44 [6] mapfoldl(f::Function, op::Function, itr::Base.Pairs{Int64, Any, LinearIndices{1, Tuple{Base.OneTo{Int64}}}, Vector{Any}}; init::Base._InitialValue) @ Base ./reduce.jl:162 [7] mapfoldl(f::Function, op::Function, itr::Base.Pairs{Int64, Any, LinearIndices{1, Tuple{Base.OneTo{Int64}}}, Vector{Any}}) @ Base ./reduce.jl:162 [8] findmax(f::Vector{Float32}, domain::Vector{Any}) @ Base ./reduce.jl:803 [9] (::EpsilonGreedyExplorer{:exp, false, Random._GLOBAL_RNG})(values::Vector{Float32}, mask::Vector{Any}) @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/policies/q_based_policies/explorers/epsilon_greedy_explorer.jl:129 [10] (::QBasedPolicy{A2CLearner{ActorCritic{Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ADAM}}, EpsilonGreedyExplorer{:exp, false, Random._GLOBAL_RNG}})(env::MyGrid, #unused#::FullActionSet, #unused#::Base.OneTo{Int64}) @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/policies/q_based_policies/q_based_policy.jl:25 [11] QBasedPolicy @ ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/policies/q_based_policies/q_based_policy.jl:22 [inlined] [12] Agent @ ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/policies/agents/agent.jl:24 [inlined] [13] _run(policy::Agent{QBasedPolicy{A2CLearner{ActorCritic{Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ADAM}}, EpsilonGreedyExplorer{:exp, false, Random._GLOBAL_RNG}}, CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 3, Array{Float32, 3}}, CircularArrayBuffers.CircularArrayBuffer{Int64, 2, Matrix{Int64}}, CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularArrayBuffer{Bool, 2, Matrix{Bool}}}}}}, env::MyGrid, stop_condition::StopAfterEpisode{ProgressMeter.Progress}, hook::TotalBatchRewardPerEpisode) @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/core/run.jl:27 [14] run(policy::Agent{QBasedPolicy{A2CLearner{ActorCritic{Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ADAM}}, EpsilonGreedyExplorer{:exp, false, Random._GLOBAL_RNG}}, CircularArraySARTTrajectory{NamedTuple{(:state, :action, :reward, :terminal), Tuple{CircularArrayBuffers.CircularArrayBuffer{Float32, 3, Array{Float32, 3}}, CircularArrayBuffers.CircularArrayBuffer{Int64, 2, Matrix{Int64}}, CircularArrayBuffers.CircularArrayBuffer{Float32, 2, Matrix{Float32}}, CircularArrayBuffers.CircularArrayBuffer{Bool, 2, Matrix{Bool}}}}}}, env::MyGrid, stop_condition::StopAfterEpisode{ProgressMeter.Progress}, hook::TotalBatchRewardPerEpisode) @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/s9XPF/src/core/run.jl:10 [15] top-level scope @ In[3]:1 [16] eval @ ./boot.jl:373 [inlined] [17] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1196 WARNING: both Losses and NNlib export "ctc_loss"; uses of it in module Flux must be qualified