aHash icon indicating copy to clipboard operation
aHash copied to clipboard

one test failure on s390x-unknown-linux-gnu (big endian)

Open decathorpe opened this issue 2 years ago • 7 comments

I'm responsible for builds of ahash on Fedora Linux. With the latest bugfixes for LLVM and Rust, I was finally able to re-enable running the test suite during our builds, and I have now found a failure that only occurrs on s390x (IBM System Z), the only big-endian architecture we support:

failures:
---- operations::test::test_add_length stdout ----
thread 'operations::test::test_add_length' panicked at 'assertion failed: `(left == right)`
  left: `18446744073709551614`,
 right: `18446744073709551615`', src/operations.rs:380:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
    operations::test::test_add_length

All other tests pass on s390x-unknown-linux-gnu. All tests succeed on the other architectures we support:

  • x86_64-unknown-linux-gnu
  • i686-unknown-linux-gnu
  • aarch64-unknown-linux-gnu
  • powerpc64le-unknown-linux-gnu
  • armv7-unknown-linux-gnueabihf

Tests were run with Rust 1.67.1 on Fedora Linux.

decathorpe avatar Feb 21 '23 13:02 decathorpe

@decathorpe This is very confusing. Is it the case that this fails on s390x

    fn test_or() {
        assert_eq!(((u64::MAX as u128) << 64 | 50) >> 64, u64::MAX as u128);
    }    

If so, is that a cpu bug?

tkaitchuck avatar Oct 31 '23 04:10 tkaitchuck

Well, the bit pattern for 50u128 is different on little-endian and big-endian systems, by definition ... doing bitwise operations with hard-coded constants like this is bound to cause endianness related bugs.

I tried writing a small program for the test case above:

fn main() {
    println!("50u128 (LE): {:?}", 50u128.to_le_bytes());
    println!("50u128 (BE): {:?}", 50u128.to_be_bytes());
    assert_eq!(((u64::MAX as u128) << 64 | 50) >> 64, u64::MAX as u128);
}

Prints:

50u128 (LE): [50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
50u128 (BE): [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50]

Though the assertion passes on both x86_64 and s390x, so I'm not sure what's going on.

decathorpe avatar Oct 31 '23 21:10 decathorpe

I don't understand how the error you have above for operations::test::test_add_length could be generated if the assertion in the main you wrote passes.

tkaitchuck avatar Nov 12 '23 22:11 tkaitchuck

Me neither ... I will investigate further. Thank you for looking into it.

decathorpe avatar Nov 12 '23 23:11 decathorpe

Adding some debug prints as below, and running on both s390x and x86_64 shows the difference:

diff --git i/src/operations.rs w/src/operations.rs
index a420587..5646579 100644
--- i/src/operations.rs
+++ w/src/operations.rs
@@ -159,24 +159,14 @@ pub(crate) fn aesdec(value: u128, xor: u128) -> u128 {
 #[allow(unused)]
 #[inline(always)]
 pub(crate) fn add_in_length(enc: &mut u128, len: u64) {
-    #[cfg(all(target_arch = "x86_64", target_feature = "sse2", not(miri)))]
-    {
-        #[cfg(target_arch = "x86_64")]
-        use core::arch::x86_64::*;
-
-        unsafe {
-            let enc = enc as *mut u128;
-            let len = _mm_cvtsi64_si128(len as i64);
-            let data = _mm_loadu_si128(enc.cast());
-            let sum = _mm_add_epi64(data, len);
-            _mm_storeu_si128(enc.cast(), sum);
-        }
-    }
-    #[cfg(not(all(target_arch = "x86_64", target_feature = "sse2", not(miri))))]
     {
+        println!("{:#x}", enc);
         let mut t: [u64; 2] = enc.convert();
+        println!("{:#x} {:#x}", t[0], t[1]);
         t[0] = t[0].wrapping_add(len);
+        println!("{:#x} {:#x}", t[0], t[1]);
         *enc = t.convert();
+        println!("{:#x}", enc);
     }
 }
 

On x86_64

---- operations::test::test_add_length stdout ----
0xffffffffffffffff0000000000000032
0x32 0xffffffffffffffff
0x31 0xffffffffffffffff
0xffffffffffffffff0000000000000031

On s390x

---- operations::test::test_add_length stdout ----
0xffffffffffffffff0000000000000032
0xffffffffffffffff 0x32
0xfffffffffffffffe 0x32
0xfffffffffffffffe0000000000000032

As @decathorpe was supposing, it's a big-endian issue.

jamessan avatar Jan 07 '24 02:01 jamessan

I can confirm that this passes the test on both endian types

diff --git i/src/operations.rs w/src/operations.rs
index a420587..460ce1c 100644
--- i/src/operations.rs
+++ w/src/operations.rs
@@ -175,7 +175,11 @@ pub(crate) fn add_in_length(enc: &mut u128, len: u64) {
     #[cfg(not(all(target_arch = "x86_64", target_feature = "sse2", not(miri))))]
     {
         let mut t: [u64; 2] = enc.convert();
-        t[0] = t[0].wrapping_add(len);
+        if cfg!(target_endian = "big") {
+            t[1] = t[1].wrapping_add(len);
+        } else {
+            t[0] = t[0].wrapping_add(len);
+        }
         *enc = t.convert();
     }
 }

Not sure if there are other (untested) big-endian issues still lurking elsewhere, but at least cross makes it easy to run tests for other architectures (https://github.com/ructions/cargo#cross-compilation).

jamessan avatar Jan 07 '24 02:01 jamessan

@jamessan Ok good. That change is not actually necessary for the algorithm to be correct. We can make the test less sensitive by having it work with [u64; 2] and converting for the argument.

tkaitchuck avatar Jan 08 '24 23:01 tkaitchuck

Based on your comments above I came up with the following patch, I tested it on both s390x and amd64 and the tests passed.

Does this look right to you? bigendian.patch.txt

plugwash avatar Feb 01 '24 19:02 plugwash

That worked, apparently. Thanks!

jonassmedegaard avatar Feb 03 '24 09:02 jonassmedegaard