baedan issues

Results 13 issues of


                                            baedan

second-order gradient throws foreigncall error

Zygote v0.6.41, Julia 1.7.3 MWE: ```julia using Zygote α, β = randn(2, 2), randn(2, 2) g(v) = map(eachcol(v), eachcol(β)) do x, y sum(x.*x.*y) end |> sum # this fails gradient(α)...

second order

load flat parameters without mutation or `restructure`

is there a way to create a copy of an existing model with a new, _flat_ parameters vector, _without mutating_, and _without using `restructure`_? the reason for the latter two...

gradients

optimisers-dot-jl

TRPO

this PR implements Trust-Region Policy Optimization, and adds a CartPole experiment for it. to this end, i wrote a few utility functions that are shared amongst policy gradient policies (#737)....

implement `action_distribution`

questions while looking at implementation of VPG

hello! while going through `vpg.jl` i had some odds-and-ends questions. still pretty new to julia and especially Flux.jl, so please bear with me :D 1. i don't understand the point...

various eligibility trace-equipped TD methods

as far as i can tell, only off-line λ-return is implemented (`TDλReturnLearner`). any interest in implementing others, such as TD(λ), n-step truncated return, true online TD(λ), and so on? i'm...

according to RL: An Introduction (page 131), Q-learning should select an action _having already learned from the transition immediately preceding it_. ![image](https://user-images.githubusercontent.com/106585642/174997616-5a507636-2c9c-4d6d-8f0f-8a0d33f6a733.jpeg) this differentiates it from SARSA, which selects an...

baedan

second-order gradient throws foreigncall error

load flat parameters without mutation or `restructure`

TRPO

implement `action_distribution`

questions while looking at implementation of VPG

various eligibility trace-equipped TD methods

Q-learning update timing

estimate v.s. basis in policies

`TDLearner` time step parameter

parallelize