swift-apis icon indicating copy to clipboard operation
swift-apis copied to clipboard

Investigate optimizer numerical correctness vs Python reference implementations

Open dan-zheng opened this issue 4 years ago • 2 comments

https://github.com/tensorflow/swift-apis/pull/758 adds Python TensorFlow reference implementations for optimizer numerical correctness.

This issue tracks numerical differences between Swift optimizer implementations and the reference implementations. See references to TF-759 in Tests/TensorFlowTests/OptimizerTests.swift for occurrences.


Some differences are larger than others. I think we should strive for exact numerical equality if possible, for the same optimizer parameters.

Current examples:

  • SGD(for: values, learningRate: 1e-3): big difference
    • Swift: [0.49999535, -0.10000112, -3.000017]
    • Python: [ 0.49999967, -0.00999999, -0.01999998]
  • AdaGrad(for: values, learningRate: 1e-3, epsilon: 1e-7): big difference for the third value
    • Swift: [0.061354622, -0.057095252, -0.061786927]
    • Python: [ 0.06179592, -0.05709525, -0.05987222]
  • AdaMax(for: values, learningRate: 1e-3, epsilon: 1e-7): small difference
    • Swift: [0.9999907, -0.99999064, -0.9999907]
    • Python: [ 0.99999076, -0.99999064, -0.99999064]
  • Adam(for: values, learningRate: 1e-3, epsilon: 1e-7): smallest difference
    • Swift: [0.9999906, -0.9999898, -0.99999064]
    • Python: [ 0.9999907, -0.9999898, -0.9999904]

dan-zheng avatar Mar 19 '20 19:03 dan-zheng

Is it possible that the small differences are actually respective language's precision differences since Swift's default Float precision is Float32 but Python's Float precision is Float64 ?

vballoli avatar Mar 20 '20 19:03 vballoli

The Python TensorFlow optimizer reference implementation use tf.float32 precision, which should match Swift: https://github.com/tensorflow/swift-apis/blob/b7a9e56efc08f683733433ba3c7eee4966570213/Utilities/ReferenceImplementations/optimizers.py#L16-L17


These example float32 programs produce the exact same output, which give me hope that exact numerical equality is attainable:

import tensorflow as tf
x = tf.constant(1, dtype=tf.float32)
dx = tf.constant(0.1, dtype=tf.float32)
for _ in range(1000):
  x += dx
print(x.numpy())
# 100.99903
var x: Float = 1
let dx: Float = 0.1
for _ in 0..<1000 {
  x += dx
}
print(x)
// 100.99903

dan-zheng avatar Mar 20 '20 20:03 dan-zheng