Jeremie Desgagne-Bouchard
Jeremie Desgagne-Bouchard
Not on a computer right now, but I think you should remove the `reset!` from the loss function. And therefore, stick to a custom training loop instead of `train!`
Thanks, I can reproduce. Cause isn't obvious to me but the behavior seems to point that the `reset!` performed within the train loop fails to affect the effective model`s state...
Unfortunately, no luck with adding training params instantiation within the training loop. The following results in the same accuracy plateau around 60%: ```julia ps = params(model) Flux.reset!(model) gs = gradient(ps)...
For reference, this is how you could use the new explicit gradient / Optimsers.jl mode: ```julia loss_fn1(m, X, y) = logitcrossentropy([m(x) for x ∈ X][end], y) accuracy1(m, X, y) =...
There's effectively something fishy going on with the RNN gradients. ```julia using Flux layer2 = Flux.Recur(Flux.RNNCell(1, 1, identity)) layer2.cell.Wi .= 5.0 layer2.cell.Wh .= 4.0 layer2.cell.b .= 0f0 layer2.cell.state0 .= 7.0...
I'll open a PR by tomorrow to add the above gradients tests. I'm also disappointed not to have taken the time to manually validate those RNN gradients until now. Zygote...
Treating the initial state as learnable parameters is still the default behavior for RNN, nothing was changed in the latest PR. My position on the subject is however that the...
Actually, an option to make learnable parameters a more first class citizen could be to use to same approach taken with the `bias` option for `Dense` layer. So we could...
Thanks for the quick consideration! Unfortunately, the same error is returned following upgrade to latest version: ```toml [[deps.Enzyme]] deps = ["CEnum", "EnzymeCore", "Enzyme_jll", "GPUCompiler", "LLVM", "Libdl", "LinearAlgebra", "ObjectFile", "Preferences", "Printf",...
Are you referring to https://github.com/JuliaBinaryWrappers/Enzyme_jll.jl? I noted `v0.0.95+0` was released yesterday and I used it in the above test. Should I look forward `v0.0.96+0`?