Michael Abbott
                                            Michael Abbott
                                        
                                    Can reproduce. I note that commenting out `BatchNorm` removes the discrepancy. And that inserting `trainmode!(model)` produces this error: ```julia ERROR: MethodError: no method matching Float32(::ForwardDiff.Dual{ForwardDiff.Tag{var"#19#20", Float32}, Float32, 12}) Stacktrace: ......
I think we can harmlessly just insert `value` into that broadcast. It's ForwardDiff-specific but maybe worth having? For automatic train-mode, if we do something like https://github.com/FluxML/NNlib.jl/pull/434 then we can have...
I had to check but NNlib is much lighter than ForwardDiff, even if that moves to StaticArraysCore. But it does load Requires, so that might be fine: ``` julia> @time_imports...
Besides detecting whether you are within AD (also an issue for dropout), the problem with BatchNorm is that ForwardDiff runs the forward pass several times (chunked mode, for any large...
For what it's worth, here's the result on master: ```julia julia> loss3(m, x, y) = Flux.Losses.mse(m(x), y); julia> state = Flux.setup(Descent(), predict) (weight = Leaf(Descent{Float64}(0.1), nothing), bias = Leaf(Descent{Float64}(0.1), nothing),...
> ["Overview"](https://fluxml.ai/Flux.jl/stable/models/overview/) which should be renamed to "First steps" or "What is machine learning?" > ["Basics"](https://fluxml.ai/Flux.jl/stable/models/basics/) which should be renamed to "Getting started with neural networks" or "What is a...
Pluto is great. Do you have in mind something small enough that if it re-runs training repeatedly, waiting for it isn't too much of a pain? Not quite sure how...
Should we consider wrapping these two in a Scale sub-layer? Replaced with `identity` when absent; perhaps that would allow also removing the `affine` field.
Xref https://github.com/JuliaML/MLUtils.jl/issues/123 (about moving & renaming) and #1992 (about NaN from batch of 1).
For me this is not as quick locally, but maybe just a quirk of my CPU? Edit: fixed now. ```julia julia> f_57(n) = sum(i -> 4/i, (-2 * n +...