swift-apis icon indicating copy to clipboard operation
swift-apis copied to clipboard

Provide a way to access non-differentiable model state from `Optimizer.update`.

Open rxwei opened this issue 5 years ago • 6 comments

@dominikgrewe pointed out that K-FAC needs to access the model's non-differentiable internal state from Optimizer.update(_:along:). My initial idea is to change the optimizer API to the following:

public protocol Optimizer {
    associatedtype Model: Layer
    associatedtype Scalar: FloatingPoint
    var learningRate: Scalar { get }
    mutating func update(_ variables: inout Model, along gradient: Model.CotangentVector)
}

But this comes at the cost of complicating concrete optimizers' implementation of update(_:along:):

  • Too long: model.allDifferentiableVariables.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self).
  • Too too long: model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon).

And we obviously can't assign secondMoments.allDifferentiableVariables to a local variable because we need setter access. Something like inout var modelVariables = model.allDifferentiableVariables is not possible in Swift yet until the ownership model and related things get fleshed out.

Another issue is that making model be inout is not semantically accurate: we don't want to mutate a model's non-differentiable states in an optimizer.

Maybe what we really need is optimizer-specific protocols that require model states, which models will implement. The specific optimizers will have such a generic constraint, and take these states as initializer parameters.

rxwei avatar Feb 12 '19 20:02 rxwei

Another issue is that making model be inout is not semantically accurate: we don't want to access model's non-differentiable states in an optimizer.

I wonder what you mean by this. It seems we do actually want to access a model's non-differentiable states in an optimizer.

Did you mean "we don't want to mutate a model's non-differentiable states in an optimizer"?

dan-zheng avatar Feb 12 '19 20:02 dan-zheng

Thanks for pointing that out. Fixed.

rxwei avatar Feb 12 '19 20:02 rxwei

  • Too too long: secondMoments.allDifferentiableVariables[keyPath: kp] = secondMoments.allDifferentiableVariables[keyPath: kp] * beta2 + (1 - beta2) * gradient[keyPath: kp] * gradient[keyPath: kp].

Can't variables like secondMoments still be of type Model.AllDifferentiableVariables?

dominikgrewe avatar Feb 12 '19 21:02 dominikgrewe

Ah yes, I used the wrong example here. What I really meant to change to demonstrate the verbosity is any setter indexing from model, which now becomes model.allDifferentiableVariables[keyPath: kp] ....

model.allDifferentiableVariables[keyPath: kp] -= stepSize * firstMoments[keyPath: kp] / (sqrt(secondMoments[keyPath: kp]) + epsilon)

rxwei avatar Feb 12 '19 21:02 rxwei

Changed the protocol as I demonstrated above. We will discuss further if the verbosity becomes a concern.

rxwei avatar Feb 13 '19 00:02 rxwei

Actually it starts to become a problem. Differentiable.allDifferentiableVariables should not be required to have a setter. See mailing list thread for reasons why that's the case: https://groups.google.com/a/tensorflow.org/d/msg/swift/HsPnSh4DFac/8511F4ENAQAJ.

When that's fixed, we can't really use inout Model's .allDifferentiableVariables to update the model in the optimizer. So we'd have to still pass Model.AllDifferentiableVariables to update(_:along:) as an inout binding. I'm gonna go ahead and revert the PR for now. To write optimizers that depend on non-differentiable model states, we can actually make the optimizer initializer take whatever states it needs depend on (just not the entire model, to avoid accidental copies).

rxwei avatar Feb 14 '19 01:02 rxwei