Michael Abbott
Michael Abbott
This seems to cause problems with non-ascii user folder names on windows, which came up here: https://stackoverflow.com/questions/79032378/big-problems-caused-by-chinese-usernames-in-julia
> Which backends can actually discard-but-return this extra_data without calling the function twice? FWIW, [`Zygote.withgradient`](https://fluxml.ai/Zygote.jl/dev/utils/#Zygote.withgradient) supports this. And `Flux.withgradient` supports it with Enzyme.jl too, using [this code](https://github.com/FluxML/Flux.jl/blob/master/ext/FluxEnzymeExt/FluxEnzymeExt.jl#L63-L83) which at present...
That's right? `return loss, extra_data` is clearly about auxiliary data, as in the title. And the docstring I linked to says "Allows you to capture auxillary outputs, in addition to...
At first glance looks good! I'd like to look closely at how adjoint vectors get handled, as that was the tricky case before. I wonder whether `boxdot!(C, A, B)` can...
So the problem in #48 is `Tangent`, which Optimisers.jl wants to access via `getproperty`. Not sure that's a great design but it won't change soon. For Functors to use properties-not-fields...
One barrier to properties-everywhere is that Zygote uses fields: ```julia julia> using ProtoStructs, Zygote julia> @proto struct Mine num::Float64 end julia> gradient(x -> x.num^2, Mine(3.0))[1] (properties = (num = 6.0,),)...
Structured arrays do seem like a concern, as #33 required a hand-written inverse, impossible for some types. We could easily exclude them from a traverse-anything scheme.
It's tricky. Perhaps there need to me methods of `batched_mul` accepting these >3 dimension BatchedAdjoint types, so that the reshape affects the wrapped Array (or CuArray) rather than composing another...
Agree this would be better, although changing it is going to break code.
For the immediate problem of `gelu`, Flux could simply add methods to all NNlib activation functions. I don't remember quite why we decided against `missing` for this, maybe because it...