Brian Chen
Brian Chen
Thanks for the update. I'd say my comment at https://github.com/FluxML/Flux.jl/pull/2397#discussion_r1525712174 still applies, but if you feel strongly about it I'd recommend leaving that section out for now and we can...
I think what happened is a couple of file names changed, so what should've been a clear ask became slightly more confusion. Just to put it in words though, the...
> `custom_layers.md` don't have any models that are defined via `struct ... end` all the models there are chains and other blocks provided by Flux, so I cannot reuse any...
I think this is a dupe of https://github.com/FluxML/NNlib.jl/issues/523?
Can you try pulling `y .^ 2` and `ŷ .^ 2` in https://github.com/FluxML/Flux.jl/blob/20d516bc29a98adeb3e831c382ff0e805f6a0b33/src/losses/functions.jl#L519 out on their own lines and seeing which one fails?
Putting a backlink to https://github.com/FluxML/Flux.jl/issues/2096 because this work should close that.
I think we were waiting for a couple more features to land so we could have parity with some of the remaining use cases people might use implicit params for....
I don't have time to comment on this in detail now (will do so later), but the decision to diverge from PyTorch was not made lightly. IIRC it was something...
Ok, I did some more digging into why PyTorch decided to couple the learning rate and weight decay coefficient for their AdamW implementation. My best guess is that [this comment](https://github.com/pytorch/pytorch/pull/3740#issuecomment-460077904)...
I actually opened an issue on the Optax repo about this, and their more or less said they wanted to copy PyTorch...