Kyle Daruwalla
Kyle Daruwalla
True, as I think about this more, `Descent` clearly implies only one thing. Asking for `Ascent` to be defined is very reasonable. Okay so maybe I'm more accepting that this...
For simple ops like convolution, this works, and it's exactly what ONNX.jl does on dev (except with NNlib instead of Flux). But the example above features MHA layers which has...
Not noise, the whole reason for the issue is to discuss 😀
The functional approach that you link to is discussed a bit in https://github.com/FluxML/Optimisers.jl/pull/49. The final solution probably will be something like your description—walking the model and marking certain leaf nodes...
> This is intriguing, but would you not still have to store some information about which layers are frozen into the model itself? E.g. by having `@freeze Dense(2,4)` return `Frozen(Dense(2,4))`...
Just for reference, what's the approach here? Are you following the steps in https://github.com/FluxML/ML-Coordination-Tracker/issues/9 ? So, is this to track hot fixes to get things running, and later we'll refresh...
Sure, did you mean zoo tutorials?
I agree that a single Project.toml and Manifest.toml is better. Allowing certain tutorials to fall out of compatibility with the latest Flux is not what we want. Every tutorial should...
My only concern is that a user needs to install lots of packages to run one specific example. Maybe it is better to have a script that automates bumping Flux...
While there is nothing wrong with what's already in the PR, this will need all the "things to add" like a README, toml, complete training script, etc. before it is...