Jeremie Desgagne-Bouchard comments

Results 78 comments of


                                            Jeremie Desgagne-Bouchard

trafficstars

explicit differentiation for RNN gives wrong results

Not on a computer right now, but I think you should remove the `reset!` from the loss function. And therefore, stick to a custom training loop instead of `train!`

explicit differentiation for RNN gives wrong results

Thanks, I can reproduce. Cause isn't obvious to me but the behavior seems to point that the `reset!` performed within the train loop fails to affect the effective model`s state...

explicit differentiation for RNN gives wrong results

Unfortunately, no luck with adding training params instantiation within the training loop. The following results in the same accuracy plateau around 60%: ```julia ps = params(model) Flux.reset!(model) gs = gradient(ps)...

explicit differentiation for RNN gives wrong results

For reference, this is how you could use the new explicit gradient / Optimsers.jl mode: ```julia loss_fn1(m, X, y) = logitcrossentropy([m(x) for x ∈ X][end], y) accuracy1(m, X, y) =...

explicit differentiation for RNN gives wrong results

There's effectively something fishy going on with the RNN gradients. ```julia using Flux layer2 = Flux.Recur(Flux.RNNCell(1, 1, identity)) layer2.cell.Wi .= 5.0 layer2.cell.Wh .= 4.0 layer2.cell.b .= 0f0 layer2.cell.state0 .= 7.0...

explicit differentiation for RNN gives wrong results

I'll open a PR by tomorrow to add the above gradients tests. I'm also disappointed not to have taken the time to manually validate those RNN gradients until now. Zygote...

Initial state in RNNs should not be learnable by default

Treating the initial state as learnable parameters is still the default behavior for RNN, nothing was changed in the latest PR. My position on the subject is however that the...

Initial state in RNNs should not be learnable by default

Actually, an option to make learnable parameters a more first class citizen could be to use to same approach taken with the `bias` option for `Dense` layer. So we could...

Handle masked reverse mode atomic append

Thanks for the quick consideration! Unfortunately, the same error is returned following upgrade to latest version: ```toml [[deps.Enzyme]] deps = ["CEnum", "EnzymeCore", "Enzyme_jll", "GPUCompiler", "LLVM", "Libdl", "LinearAlgebra", "ObjectFile", "Preferences", "Printf",...

Handle masked reverse mode atomic append

Are you referring to https://github.com/JuliaBinaryWrappers/Enzyme_jll.jl? I noted `v0.0.95+0` was released yesterday and I used it in the above test. Should I look forward `v0.0.96+0`?