swift-apis
swift-apis copied to clipboard
Math Protocols
@rxwei @dan-zheng I wasn't sure where to put this, but I believe an issue here is a good place to collect our thoughts and comments. My initial thoughts are:
Pointwise Multiplicative
I have a couple comments about PointwiseMultiplicative
:
- Similar to how
AdditiveArithmetic
defines-
and-=
, I believe we should define/
and/=
forPointwiseMultiplicative
thus enabling efficient division and making this dual toAdditiveArithmetic
. It may not be very formal, but given that most of the math-related protocols are not and are more geared towards practicality I think this is fine. Also, for our purposes these protocols are used over aggregates of tensors where/
and/=
can defined so this change should be fine for our purposes. What do you think? - Following from point 1, if we aim for consistency with the standard library we may want to call this
MultiplicativeArithmetic
, or renameAdditiveArithmetic
toPointwiseAdditive
. I personally prefer the latter since it will also allow for consistency with e.g.PointwiseComparative
, but not sure how that would go with the Swift evolution process.
Optimizers
In order to simplify the remaining optimizers we need to add support for comparisons (e.g., max
) and for computing the absolute value of tensors element-wise.
For comparisons, I believe something along the lines of the following would be great:
public protocol PointwiseEquatable {
associatedtype Boolean
static func == (lhs: Self, rhs: Self) -> Boolean
}
public protocol PointwiseComparable: PointwiseEquatable {
static func < (lhs: Self, rhs: Self) -> Boolean
static func <= (lhs: Self, rhs: Self) -> Boolean
static func > (lhs: Self, rhs: Self) -> Boolean
static func >= (lhs: Self, rhs: Self) -> Boolean
static func max(lhs: Self, rhs: Self) -> Boolean
static func min(lhs: Self, rhs: Self) -> Boolean
}
I'm not sure about the absolute value, but I believe we may be able to do something like:
public protocol PointwiseMagnitude { // ???
func abs() -> Self
}
Reductions
We need some way to perform reductions over tensor aggregates. This comes up quite a lot in machine learning. For example, we often want to know the max over all elements in an aggregate. Or, for a more practical motivating example consider clipping gradients based on the global norm over the aggregate structure. This would require us to compute the norm of each tensor in the aggregate (norm[t]) and then compute:
globalNorm = sqrt(sum([norm[t].squared() for t in tensors]))
Say we can compute sqrt(_:)
and squared()
using a conformance to ElementaryFunctions
. How do we go about the sum reduction over the aggregate?
Adding support for reductions introduces a couple of challenges. First, we would need to know the Scalar
type of all tensors in the structure and force it to be the same for all. Alternatively, we can follow a similar approach to VectorProtocol
and use Float
for all tensors. However, in that case wouldn't we lose precision when dealing with say Double
tensors (this problem also applies to VectorProtocol
actually so what how do you handle it there?)? We could avoid this by having a Scalar
type (which would also require all layers to define a Scalar
type -- @rxwei you mentioned though that we want to avoid this to potentially allow for mixed-precision training). In either case, I believe this is worth a discussion.
Also, reducing over an aggregate would require a protocol that looks something like this:
public protocol Reducible {
associatedtype Scalar
func sum() -> Scalar where Scalar: AdditiveArithmetic
func mean() -> Scalar where Scalar: AdditiveArithmetic
func product() -> Scalar where Scalar: PointwiseMultiplicative
// ... more reductions such as comparison-based reductions.
// This needs to be used by the `_meanHelper()` for example.
func count() -> Scalar
// The following are needed for applying the reduction across the reduced members.
static func _sumHelper(_ x: Scalar, _ y: Scalar) -> Scalar where Scalar: AdditiveArithmetic
static func _meanHelper(_ x: Scalar, _ y: Scalar) -> Scalar where Scalar: AdditiveArithmetic
static func _productHelper(_ x: Scalar, _ y: Scalar) -> Scalar where Scalar: PointwiseMultiplicative
}
This seems overly complicated so maybe we can find a better solution? One nice thing about using a Scalar
type is that it may remove the need for a Reducible
protocol by allowing users to perform reductions manually using KeyPathIterable
. For example, my current implementation for clipping by global norm look like this:
extension KeyPathIterable {
public mutating func clipByGlobalNorm<Scalar: TensorFlowFloatingPoint>(clipNorm: Scalar) {
let clipNorm = Tensor<Scalar>(clipNorm)
var globalNorm = Tensor<Scalar>(zeros: [])
for kp in self.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self) {
globalNorm += self[keyPath: kp].squared().sum()
}
globalNorm = sqrt(globalNorm)
for kp in self.recursivelyAllWritableKeyPaths(to: Tensor<Scalar>.self) {
self[keyPath: kp] *= clipNorm / max(globalNorm, clipNorm)
}
}
}
Of course it doesn't have to be defined as an extension to KeyPathIterable
, but I use this for now because I cannot yet define it as an extension to Layer.TangentVector
.
What are your thoughts on the above? Also, why do we call VectorProtocol
that instead of VectorSpace
?
Thanks for the write-up! I'll tag #322, Numeric.Magnitude
and AdditiveArithmetic
discussions on the evolution forum so that people looking at this issue can also get some context from those threads.
Yeah, everything seems like a good direction for the standard library up till the reduction example, which I think is a great motivating example for having library-customizable conformance derivation sooner. Let's continue the two Swift Evolution discussions
By the way, reading the AdditiveArithmetic
discussions, I tend to agree with having a separate protocol called PointwiseAdditive
which have the same requirements but is derived differently. In this case, memberwise automatic derivations kick in only for PointwiseAdditive
and this fits nicely with the "pointwise" semantics as "pointwise" sort of implies "memberwise".
Regarding reductions, I believe Swift-side type class derivation support would be amazing and would definitely be an elegant solution to this.
FWIW, I'd love to see a bigger discusison of this, because I tend to agree that we should be using something different than the core language protocols for our layer abstractions.
Among other things, we have multiprecision floating point to deal with in our optimizers, and the core protocols really aren't designed to deal with that. I'd love to see a discussion about these topics on the mailing list...
Happy to start with a discussion on the mailing list going forward!