half-rs icon indicating copy to clipboard operation
half-rs copied to clipboard

Illegal instruction on Android Virtual Device running on Github Action Runner

Open cre4ture opened this issue 4 months ago • 0 comments

Hello,

im working on an issue in the Android/Termux CI from uutils project. The issue: It's currently only reproducable in our CI. On my private PC I can't reproduce the issue.

After adding more logs and several CI test-runs, I managed to trace it down to the half-rs crate:

#[test]
fn test_f16c_f16_to_f32_direct() {
    let bo = ::od::byteorder_io::ByteOrder::Little;
    let bits = bo.read_u16(&[0x00, 0x3c]);

    let result_f16 = half::f16::from_bits(bits);
    let result = f64::from(result_f16); // crashes here
    assert_eq!(1.0, result);
}

The function in half-rs responsible for converting f16 to f64 is this:

#[inline]
pub(crate) fn f16_to_f64(i: u16) -> f64 {
    convert_fn! {
        if x86_feature("f16c") {
            unsafe { x86::f16_to_f32_x86_f16c(i) as f64 }   // crashes here
        } else if aarch64_feature("fp16") {
            unsafe { aarch64::f16_to_f64_fp16(i) }
        } else {
            f16_to_f64_fallback(i)
        }
    }
}

I tried to trace it down even further, by extracting relevant code into a own test. This test fails with SIG 4. Which is "illegal instruction".

#[test]
fn test_f16c_direct() {

    #[cfg(target_arch = "x86_64")]
    use std::arch::x86_64::{_mm_cvtph_ps, __m128i, __m128};
    #[cfg(target_arch = "x86")]
    use std::arch::x86::{_mm_cvtph_ps, __m128i, __m128};

    let i = 0u16;

    let result: f32 = unsafe {
        let mut vec = std::mem::MaybeUninit::<__m128i>::zeroed();
        vec.as_mut_ptr().cast::<u16>().write(i);
        let retval = _mm_cvtph_ps(vec.assume_init());
        *(&retval as *const __m128).cast()
    };

    assert_eq!(0.0, result);
}

#[test]
fn test_if_f16c_is_detected() {

    let detected = std::is_x86_feature_detected!("f16c");
    if detected {
        println!("f16c detected!");
    } else {
        println!("f16c not detected!");
    }

    assert_eq!(true, detected);
}

According to the last test (test_if_f16c_is_detected) f16c is supportted. Also the /proc/cpuinfo states that f16c is supported. I can't explain why its crashing then.

I have the feeling that its really a bug in the android emulator. It states that the f16c is supported, but actually fails to execute it.

Do you have experience with such kind of error? What can I do ro reproduce it locally? Is it related to the actual CPU? The GitHub action runner has a AMD EPYC 7763 64-Core Processor. My machine has Intel I7. Is the github runner running in a own VM and can the setup of VM in VM cause this issue? We are using the reactivecircus/android-emulator-runner for creating the android virtual device.

Any hints or clues? Is there an other way to test if the f16c instruction set is supported? Such that we can create an error report for google?

cre4ture avatar Feb 11 '24 14:02 cre4ture