Brian Chen
Brian Chen
> So I think really what we want is a serialization api that serializes a whole state (possibly by `fmap`ing or something) in a way that respect object identity, and...
Something BSON-ish, something HDF5-ish, maybe exploring newer DL-focused formats like https://github.com/huggingface/safetensors.
Backwards compat aside, is there a need for `update!` if we have `update!!`? Just to throw an error if immutable params are encountered, I guess?
AFAICT the only missing part is functionality to convert a tree of gradients from Zygote into a tree of tuples. What are your thoughts about implementing that?
Yes, higher-order terms in particular. The current interface on master puts the `state` first for `update(o, state, x::T, x̄s...)` and `apply(o, state, x, dxs...)`. Those were changed to `update(o, x::T,...
A big reason why AdaHessian and other optimizers which use the HVP/Hessian diagonal or an approximation thereof haven't been ported is because they rely on nested differentiation. If we can...
I think based on https://github.com/FluxML/Optimisers.jl/pull/135#issuecomment-1518122404 everything should be `state` or `opt_state`? @darsnack can correct my recollection if I'm wrong.
Optax uses `opt_state` in their docs and `state` internally in their rules. They have it easier though because rules are "vectorized" and thus no function is really dealing with `Leaf`...
[Optax's](https://optax.readthedocs.io/en/latest/api.html#masked-update) design may be of interest here. They of course can get away with making everything immutable. However, if we think of a masked state tree as a temporary view...
How common are state tree merges? If you make `Frozen` a wrapper type, then the modification is reversible. > Good question. First, what's the desired behaviour here at all? If...