swift-numerics icon indicating copy to clipboard operation
swift-numerics copied to clipboard

Inconsistent results between MacOS and Linux

Open ewconnell opened this issue 4 years ago • 2 comments

Hi, my framework uses the Numerics package. We developed a model using the SwiftRT API and a version using TensorFlow. On the Mac we get the exact same results. The results were recorded and put into the unit tests to assure a constant point of reference.

On Ubuntu and gLinux, TensorFlow produces the exact same results. SwiftRT using the Numerics package almost produces the same results up to 0.001 difference. I am wondering if the Numerics package is relying on system library functions which are inconsistent across platforms?

If so, this is a big problem. Any insights?

Thanks, Ed

ewconnell avatar Feb 04 '20 22:02 ewconnell

Is there example reproduce the problem?

SusanDoggie avatar Feb 05 '20 07:02 SusanDoggie

Yes, the system math library functions do not generally produce identical results on different platforms. It's a possible long-term goal for Swift Numerics to provide portable implementations, but it's very long-term¹. If you need exactly reproducible math library results in the short-term, you will need to link against a math library that you control for that specific purpose (one option is to use crlibm, which is an open-source correctly rounded math library; because it's correctly-rounded, it is necessarily portable, but it is significantly slower--occasionally orders of magnitude slower--than a "normal" system math library).

  1. I've written a system math library three times (Darwin's for x86_64, arm, and arm64; I did not attempt to deliver identical results across these platforms, because our goal is to deliver the best library we can for each platform). A truly portable math library requires one of: a. significant compromises to performance b. significant compromises to accuracy c. a baseline of Haswell for x86 (to ensure availability of hardware FMA), plus modest compromises of performance. You also need to decide how to handle fixing bugs; fixing bugs is good, but it will perturb results, which is contrary to the goal of bitwise-exact reproducibility. There needs to be a policy in place to handle this scenario, and you need to design an API to support it.

stephentyrone avatar Feb 05 '20 12:02 stephentyrone