libm icon indicating copy to clipboard operation
libm copied to clipboard

Investigate jn and jnf randomly falling on CI

Open Schultzer opened this issue 6 years ago • 6 comments

Hi guys,

jn and jnf are randomly failing on ARM.

On master Build

cargo test --features checked musl-reference-tests --target arm-unknown-linux-gnueabi --release
...
failures:

---- jn_matches_musl stdout ----
thread 'main' panicked at 'INPUT: [114, 4594974205335009568] EXPECTED: [1617394868955] ACTUAL 0.0', /target/arm-unknown-linux-gnueabi/release/build/libm-a8dc3a3490b5fe96/out/musl-tests.rs:108:17
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.


failures:
    jn_matches_musl

On #169 Build

cargo test --features checked musl-reference-tests --target arm-unknown-linux-gnueabi
...

failures:

---- jn_matches_musl stdout ----
thread 'main' panicked at 'INPUT: [136, 4602429132083530282] EXPECTED: [15924833] ACTUAL 0.0', /target/arm-unknown-linux-gnueabi/debug/build/libm-a6d3571cbabba738/out/musl-tests.rs:108:17
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

---- jnf_matches_musl stdout ----
thread 'main' panicked at 'INPUT: [29, 1061546867] EXPECTED: [86] ACTUAL 0.0', /target/arm-unknown-linux-gnueabi/debug/build/libm-a6d3571cbabba738/out/musl-tests.rs:16:17


failures:
    jn_matches_musl
    jnf_matches_musl

Questions

  • Is it related to the use of wrapping_add and wrapping_sup?
  • Are the algorithms in musl/newlib relying on overflows?

Schultzer avatar May 17 '19 19:05 Schultzer

Just to add why I believe this is some kind of random fluke. https://dev.azure.com/benjamin0103/benjamin/_build/results?buildId=9

Schultzer avatar May 17 '19 19:05 Schultzer

You can dbg! these fns with INPUT on different targets and find difference.

burrbull avatar May 17 '19 21:05 burrbull

Thanks I fixed the test in jnf but it had a cascading effect so the root to this is deep but it might lead me in the right direction.

Schultzer avatar May 17 '19 22:05 Schultzer

It looks like my previous diagnosis was wrong and I can actually reproduce this locally. Tests such as:

diff --git a/src/math/jn.rs b/src/math/jn.rs
index 1be167f..f70ad02 100644
--- a/src/math/jn.rs
+++ b/src/math/jn.rs
@@ -341,3 +341,24 @@ pub fn yn(n: i32, x: f64) -> f64 {
         b
     }
 }
+
+#[cfg(test)]
+mod tests {
+    #[test]
+    fn test_from_ci() {
+        let ret = super::jn(
+            136,
+            f64::from_bits(4602429132083530282),
+        );
+        assert_eq!(ret, f64::from_bits(15924833));
+    }
+
+    #[test]
+    fn test_from_ci2() {
+        let ret = super::jn(
+            114,
+            f64::from_bits(4594974205335009568),
+        );
+        assert_eq!(ret, f64::from_bits(1617394868955));
+    }
+}
diff --git a/src/math/jnf.rs b/src/math/jnf.rs
index 360f62e..d19657c 100644
--- a/src/math/jnf.rs
+++ b/src/math/jnf.rs
@@ -257,3 +257,15 @@ pub fn ynf(n: i32, x: f32) -> f32 {
         b
     }
 }
+
+#[cfg(test)]
+mod tests {
+    #[test]
+    fn test_from_ci() {
+        let ret = super::jnf(
+            29,
+            f32::from_bits(1061546867),
+        );
+        assert_eq!(ret, f32::from_bits(86));
+    }
+}

when run with

$ rustup run nightly ./ci/run-docker.sh arm-unknown-linux-gnueabi

does indeed have the same failures:

failures:

---- math::jn::tests::test_from_ci stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007867913`', src/math/jn.rs:353:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

---- math::jn::tests::test_from_ci2 stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007990992405106`', src/math/jn.rs:362:9

---- math::jnf::tests::test_from_ci stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `0.00000000000000000000000000000000000000000012`', src/math/jnf.rs:269:9


failures:
    math::jn::tests::test_from_ci
    math::jn::tests::test_from_ci2
    math::jnf::tests::test_from_ci

would be good to investigate and see what's going on!

In the meantime though we'll likely want to disable/remove these intrinsics until we can sort through the errors

alexcrichton avatar May 20 '19 14:05 alexcrichton

  • Is it related to the use of wrapping_add and wrapping_sup?

  • Are the algorithms in musl/newlib relying on overflows?

@Schultzer , Sorry for the confusion. This code was directly translated without any thought put in.

Believe it or not, both musl and newlib rely on overflows in those functions, but that definitely should not affect this code.

Here's the original code:

	if ((ix | (lx|-lx)>>31) > 0x7ff00000) /* nan */
		return x;

-lx will overflow if lx is unsigned and not zero. However, this check is trivial to rewrite in simpler terms without using wrapping_add, because (lx | -lx) >> 31 is equivalent to (lx != 0) as u32:

diff --git a/src/math/jn.rs b/src/math/jn.rs
index 1be167f..194bf41 100644
--- a/src/math/jn.rs
+++ b/src/math/jn.rs
@@ -53,8 +53,7 @@ pub fn jn(n: i32, mut x: f64) -> f64 {
     sign = (ix >> 31) != 0;
     ix &= 0x7fffffff;

-    // -lx == !lx + 1
-    if (ix | (lx | ((!lx).wrapping_add(1))) >> 31) > 0x7ff00000 {
+    if x.is_nan() {
         /* nan */
         return x;
     }
@@ -267,8 +266,7 @@ pub fn yn(n: i32, x: f64) -> f64 {
     sign = (ix >> 31) != 0;
     ix &= 0x7fffffff;

-    // -lx == !lx + 1
-    if (ix | (lx | ((!lx).wrapping_add(1))) >> 31) > 0x7ff00000 {
+    if x.is_nan() {
         /* nan */
         return x;
     }

m1el avatar Jun 03 '19 16:06 m1el

Hi @m1el, Thanks for the insight.

I've been researching a bit and haven't really been able to come to any conclusion.

One of the things I've done was to try to make a test suite inspired from GCC libm test suite, it's on this branch add-test-suite. Right know it's only passing locally on my macbook so I'm not sure it's the right way to go.

I have a hunch that is properly due some codegen in rustc. It would be cool if it's possible to evaluate the assembly of musl functions and our implementation.

Schultzer avatar Jun 03 '19 19:06 Schultzer