Kyle Daruwalla
Kyle Daruwalla
True, that's better!
Small sidetone: I would make the initialization method the first arg to support the `do` syntax in case anyone needs it.
The semantic definition of `bias=false` means that trying to load a numeric value into it is ignored. I think that extends to `reweight!` too.
This is great! Looking ahead to the website step, I think it would be better and simpler to just host a "model zoo" or "tutorials" section in the Flux docs...
I posted my [long form comment](https://discourse.julialang.org/t/julias-broadcast-vs-jaxs-vmap/38990/37) on discourse, but here are the parts related to this discussion: - I think to get the performance of fused BLAS operations, the user...
Something like [this](https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/.60jax.2Evmap.60.20vs.20Julia.20Broadcast/near/196838817) where "depth" refers to the depth in the call stack. Does KA do this? That's awesome!
Go for it! For both ViT-based models, make sure to look at the existing ViT implementation and the Layers submodules. Torchvision models are good reference, but ultimately we want something...
Btw the docs issue is now taken care of. Do we want the link in the README to point to the dedicated docs?
We already have support for turning off the batch norm [here](https://github.com/FluxML/Metalhead.jl/blob/88bcacb81b643066adecdc9bb108a6dbcaad9e3c/src/layers/conv.jl#L15-L17). The remaining task on this issue it to update the code for GoogLeNet to use that functionality.
Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters
And regularization e.g. https://github.com/FluxML/Optimisers.jl/pull/57