Flux.jl freezingdocs

Rewrite of the prev pull https://github.com/FluxML/Flux.jl/pull/2385 , god i hate git @ToucheSir

Mar 13 '24 05:03 isentropic

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.94%. Comparing base (9061b79) to head (d3b800b).

:exclamation: Current head d3b800b differs from pull request most recent head 4331dc8

Please upload reports for the commit 4331dc8 to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
+ Coverage   45.59%   73.94%   +28.34%     
===========================================
  Files          32       32               
  Lines        1895     1911       +16     
===========================================
+ Hits          864     1413      +549     
+ Misses       1031      498      -533

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 13 '24 05:03 codecov[bot]

@ToucheSir @mcabbott sorry for the time delay i got carried away. I updated the docs as you requested. Let me know your new comments. Hopefully this section could help with some of the frustrations I had for the new users

Jun 28 '24 07:06 isentropic

Thanks for the update. I'd say my comment at https://github.com/FluxML/Flux.jl/pull/2397#discussion_r1525712174 still applies, but if you feel strongly about it I'd recommend leaving that section out for now and we can look at merging the rest.

Jun 28 '24 22:06 ToucheSir

Oh I meant to incorporate your comment, maybe I couldn't understand completely what you meant. Your big "prominent links" part means like links in the subheadings directly?

I feel like I did the ”move to advanced" part of your comments

Could you just state more clearly what you suggested, I'll be happy to incorporate

Jun 29 '24 03:06 isentropic

I think what happened is a couple of file names changed, so what should've been a clear ask became slightly more confusion. Just to put it in words though, the ask is to move the content under the sections "Static freezing per model definition" and "Excluding from model definition" to https://github.com/FluxML/Flux.jl/blob/master/docs/src/guide/models/custom_layers.md. That way, we co-locate code snippets that talk about model definition instead of making users look across multiple pages for them.

To ensure users know that this examples are also useful for excluding parameters from training, the freezing docs page can then link to the two headings under custom_layers.md. Kind of a "see also" with maybe a couple of sentences explaining why the linked sections are relevant.

Jul 04 '24 14:07 ToucheSir

I tried to do this the way you suggested but it feels even more haphazard this way. custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions. I really tried to somehow merge it, but its just bad I feel.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze. The whole point seems gone, because I tried to summarize this already spread apart information into a single page, and seems like your suggestions are to split it again.

I suggest to merge it as is. Addition of a new page shouldn't hurt that much because it is very focused and specific to model freezing.

Jul 15 '24 08:07 isentropic

custom_layers.md don't have any models that are defined via struct ... end all the models there are chains and other blocks provided by Flux, so I cannot reuse any definitions.

That doesn't sound like https://fluxml.ai/Flux.jl/stable/guide/models/custom_layers/. Every section on that page has an example layer defined with struct ... end.

Another reason is that now the file i'm contributed 'misc-model-tweaking' is partitioned again, and there is no single place to understand about the differences of various ways to freeze.

The fundamental tension here is about what "freezing" means. In my mind, freezing parameters of a layer or model implies the ability to "thaw" those parameters later. Dynamic freezing using freeze! and thaw! does this, for example. This is also in line with the language and expectations around freezing in PyTorch and JAX (which links to https://optax.readthedocs.io/en/latest/api/optimizer_wrappers.html#masked-update).

In contrast, excluding model fields from training is treated differently. In this case, the non-trainable-ness or non-parameter-ness (depending on the library) of an array is an inherent property of it. One can not thaw it, nor is it stopped from updating (as "frozen" might imply) because it can still be mutable—think BatchNorm running stats. It's illustrative that the APIs libraries use here are distinctly named and documented from the ones they have for freezing for training. See for example PyTorch's register_buffer or the parameter vs state dichotomy in Flax. Flux follows a similar philosophy with Flux.@layer MyLayer trainable=(fields...,).

Jul 15 '24 15:07 ToucheSir

I honestly tried to do it your way, but I hated what it looked like at the end. I just thought having one more standalone page of docs is okay as long as it's all related + you can delete this one page later with minimal effort.

I am in the shoes of the learner of the library and to me this PR feels cleaner and better than trying to move things all over the place. I even indicate at the top of the page that the topics are a bit disconnected. If you still feel otherwise, please change it the way you deem appropriate.

You can take the commits partially or as a whole or close the PR all-together. At this point I don't wanna argue about a few written paragraphs anymore.

I thank both of you (@ToucheSir @mcabbott) for feedback given in the previous thread, and also the tips you've given me in the slack channel.

Jul 17 '24 03:07 isentropic