swift-apis Provide a way to access non-differentiable model state from `Optimizer.update`.

Provide a way to access non-differentiable model state from `Optimizer.update`.

Open rxwei opened this issue 5 years ago • 6 comments

@dominikgrewe pointed out that K-FAC needs to access the model's non-differentiable internal state from Optimizer.update(_:along:). My initial idea is to change the optimizer API to the following:

public protocol Optimizer {
    associatedtype Model: Layer
    associatedtype Scalar: FloatingPoint
    var learningRate: Scalar { get }
    mutating func update(_ variables: inout Model, along gradient: Model.CotangentVector)
}

But this comes at the cost of complicating concrete optimizers' implementation of update(_:along:):

Too long: model.allDifferentiableVariables.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self).
Too too long: model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon).

And we obviously can't assign secondMoments.allDifferentiableVariables to a local variable because we need setter access. Something like inout var modelVariables = model.allDifferentiableVariables is not possible in Swift yet until the ownership model and related things get fleshed out.

Another issue is that making model be inout is not semantically accurate: we don't want to mutate a model's non-differentiable states in an optimizer.

Maybe what we really need is optimizer-specific protocols that require model states, which models will implement. The specific optimizers will have such a generic constraint, and take these states as initializer parameters.

Feb 12 '19 20:02 rxwei

Another issue is that making model be inout is not semantically accurate: we don't want to access model's non-differentiable states in an optimizer.

I wonder what you mean by this. It seems we do actually want to access a model's non-differentiable states in an optimizer.

Did you mean "we don't want to mutate a model's non-differentiable states in an optimizer"?

Feb 12 '19 20:02 dan-zheng

Thanks for pointing that out. Fixed.

Feb 12 '19 20:02 rxwei

Too too long: secondMoments.allDifferentiableVariables[keyPath: kp] = secondMoments.allDifferentiableVariables[keyPath: kp] * beta2 + (1 - beta2) * gradient[keyPath: kp] * gradient[keyPath: kp].

Can't variables like secondMoments still be of type Model.AllDifferentiableVariables?

Feb 12 '19 21:02 dominikgrewe

Ah yes, I used the wrong example here. What I really meant to change to demonstrate the verbosity is any setter indexing from model, which now becomes model.allDifferentiableVariables[keyPath: kp] ....

model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon)

Feb 12 '19 21:02 rxwei

Changed the protocol as I demonstrated above. We will discuss further if the verbosity becomes a concern.

Feb 13 '19 00:02 rxwei

Actually it starts to become a problem. Differentiable.allDifferentiableVariables should not be required to have a setter. See mailing list thread for reasons why that's the case: https://groups.google.com/a/tensorflow.org/d/msg/swift/HsPnSh4DFac/8511F4ENAQAJ.

When that's fixed, we can't really use inout Model's .allDifferentiableVariables to update the model in the optimizer. So we'd have to still pass Model.AllDifferentiableVariables to update(_:along:) as an inout binding. I'm gonna go ahead and revert the PR for now. To write optimizers that depend on non-differentiable model states, we can actually make the optimizer initializer take whatever states it needs depend on (just not the entire model, to avoid accidental copies).

Feb 14 '19 01:02 rxwei

swift-apis swift-apis copied to clipboard

Provide a way to access non-differentiable model state from `Optimizer.update`.

swift-apis
swift-apis copied to clipboard