swift-apis
swift-apis copied to clipboard
Provide a way to access non-differentiable model state from `Optimizer.update`.
@dominikgrewe pointed out that K-FAC needs to access the model's non-differentiable internal state from Optimizer.update(_:along:). My initial idea is to change the optimizer API to the following:
public protocol Optimizer {
associatedtype Model: Layer
associatedtype Scalar: FloatingPoint
var learningRate: Scalar { get }
mutating func update(_ variables: inout Model, along gradient: Model.CotangentVector)
}
But this comes at the cost of complicating concrete optimizers' implementation of update(_:along:):
- Too long:
model.allDifferentiableVariables.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self). - Too too long:
model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon).
And we obviously can't assign secondMoments.allDifferentiableVariables to a local variable because we need setter access. Something like inout var modelVariables = model.allDifferentiableVariables is not possible in Swift yet until the ownership model and related things get fleshed out.
Another issue is that making model be inout is not semantically accurate: we don't want to mutate a model's non-differentiable states in an optimizer.
Maybe what we really need is optimizer-specific protocols that require model states, which models will implement. The specific optimizers will have such a generic constraint, and take these states as initializer parameters.
Another issue is that making model be
inoutis not semantically accurate: we don't want to access model's non-differentiable states in an optimizer.
I wonder what you mean by this. It seems we do actually want to access a model's non-differentiable states in an optimizer.
Did you mean "we don't want to mutate a model's non-differentiable states in an optimizer"?
Thanks for pointing that out. Fixed.
- Too too long:
secondMoments.allDifferentiableVariables[keyPath: kp] = secondMoments.allDifferentiableVariables[keyPath: kp] * beta2 + (1 - beta2) * gradient[keyPath: kp] * gradient[keyPath: kp].
Can't variables like secondMoments still be of type Model.AllDifferentiableVariables?
Ah yes, I used the wrong example here. What I really meant to change to demonstrate the verbosity is any setter indexing from model, which now becomes model.allDifferentiableVariables[keyPath: kp] ....
model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon)
Changed the protocol as I demonstrated above. We will discuss further if the verbosity becomes a concern.
Actually it starts to become a problem. Differentiable.allDifferentiableVariables should not be required to have a setter. See mailing list thread for reasons why that's the case: https://groups.google.com/a/tensorflow.org/d/msg/swift/HsPnSh4DFac/8511F4ENAQAJ.
When that's fixed, we can't really use inout Model's .allDifferentiableVariables to update the model in the optimizer. So we'd have to still pass Model.AllDifferentiableVariables to update(_:along:) as an inout binding. I'm gonna go ahead and revert the PR for now. To write optimizers that depend on non-differentiable model states, we can actually make the optimizer initializer take whatever states it needs depend on (just not the entire model, to avoid accidental copies).