'rsa::keypair::tests::test_rsakeypair_private_exponentiate' panicked at 'attempt to subtract with overflow', aarch64-pc-windows-msvc
commit 934ee03e67fb74322af9a805c2dd19893e461d4a, CI job "test (aarch64-pc-windows-msvc, --release, 1.66.0)" failed on the main branch.
failures:
---- rsa::keypair::tests::test_rsakeypair_private_exponentiate stdout ----
thread 'rsa::keypair::tests::test_rsakeypair_private_exponentiate' panicked at 'attempt to subtract with overflow', src\arithmetic\bigint\modulus.rs:155:38
failures:
rsa::keypair::tests::test_rsakeypair_private_exponentiate
test result: FAILED. 92 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.69s
error: test failed, to rerun pass `-p ring --lib`
Error: Process completed with exit code 1.
This seems to happen consistently in CI. Not that it is Rust 1.66 but not Stable. Wouldn't be surprised if our RSA assembly code is violating ABI or similar. Need to find somebody to debug it.
The panic points to this line failing: https://github.com/briansmith/ring/blob/934ee03e67fb74322af9a805c2dd19893e461d4a/src/arithmetic/bigint/modulus.rs#L155
This hasn't changed for a very long time, so I'm guessing it is either a codegen bug or longstanding UB in our code. We should debug it and find out what the values of r and lg_m are.
A workaround, avoiding running the tests on aarch64-pc-windows-msvc 1.66 release, was put into CI in PR #2538.
So far, this failure has only occurred in --release mode, and only for Rust 1.66 (not stable). And it is intermittent; on most CI runs the test fails, but on some runs the job passes. Also the only failure I've seen is on this specific test.
This is easily reproducible using Rust 1.66 on the latest Windows 11 ARM64 using the latest Visual Studio using on my MacBook in Parallels:
while cargo +1.66 test --lib --release test_rsakeypair_private_exponentiate -- --nocapture ; do :; done
I did a bisect after export RUSTFLAGS="-C overflow-checks=on" (We turned on that flag in the Cargo release profile recently in 9a5d0a74614be3d00d74045691cd01baa8417b9c).
After 2a430d1982c7681777e79ee0086ee170f290d25f ("Remove lto=true in Cargo.toml profiles."), we get the "attempt to subtract with overflow" error. But before that commit, at 2a430d1982c7681777e79ee0086ee170f290d25f^ and earlier, the test fails with a different error:
thread 'rsa::keypair::tests::test_rsakeypair_private_exponentiate' panicked at 'called `Result::unwrap()` on an `Err` value: Unspecified', src\rsa\keypair.rs:679:85
The bisect narrowed it down to
commit bdce4fc0e7068e70e9172bda7dcaa63a1145ca94 (HEAD)
Author: Brian Smith <[email protected]>
Date: Mon Feb 10 15:10:22 2025 -0800
arithmetic: Reverse private exponent words during key loading.
Instead of scanning the private exponent limbs backwards during
exponentiation, reverse the limbs during key loading and then scan
forward.
Indeed, if I revert that commit on the tip of the main branch, the problem fails to reproduce.
Next, I reset to the current tip of main (reverting the revert) and then tried to reproduce with Rust 1.65 and Rust 1.64. This requires tweaking Cargo.toml:
- rust-version = "1.66.0"
+ rust-version = "1.64.0"
Then, tweaking the STR accordingly, the STR reproduce the issue using 1.65:
while cargo +1.65 test --lib --release test_rsakeypair_private_exponent
iate -- --nocapture ; do :; done
But, the issue does NOT reproduce with 1.64:
while cargo +1.64 test --lib --release test_rsakeypair_private_exponent
iate -- --nocapture ; do :; done
And the issue does NOT reproduce with 1.67 either:
while cargo +1.67 test --lib --release test_rsakeypair_private_exponentiate -- --nocapture ; do :; done
I noticed that Rust 1.65 upgraded LLVM to LLVM 15 according to https://doc.rust-lang.org/beta/releases.html#version-1650-2022-11-03. However, Rust didn't upgrade to LLVM 16 until Rust 1.70 according to https://doc.rust-lang.org/beta/releases.html#version-1700-2023-06-01.
Unfortunately, simple steps to debug the issue are thwarted as, for example, making certain relevant functions #[inline(never)] or adding logging will cause the issue to stop reproducing. So the issue would need to be debugged in a debugger and/or by comparing disassemblies.
I removed this from the 0.17.15 release milestone since it isn't a regression from 0.17.14.
I removed this from the 0.17.15 release milestone since it isn't a regression from 0.17.14.
If I read the log correctly, the first release with the following commit was 0.17.9 and the last release without it was 0.17.8. However, I doubt that this commit is to blame.
commit bdce4fc0e7068e70e9172bda7dcaa63a1145ca94 (HEAD) Author: Brian Smith <[email protected]> Date: Mon Feb 10 15:10:22 2025 -0800 arithmetic: Reverse private exponent words during key loading. Instead of scanning the private exponent limbs backwards during exponentiation, reverse the limbs during key loading and then scan forward.
Also, to clarify for others, until very recently, aarch64-pc-windows was BUILT in CI (on an x86-64 Windows runner) but the tests were not run. Only recently did we start running tests in CI for aarch64-pc-windows-msvc.
Previously, I was manually running the tests on an aarch64-pc-windows-msvc machine pre-release for the MSRV and latest stable toolchains when I found it prudent to do so. But previously MSRV was lower than 1.65 so we wouldn't have noticed it.
A workaround, avoiding running the tests on aarch64-pc-windows-msvc 1.66 release, was put into CI in PR #2538.
PR #2550 undoes this workaround, replacing it with a minimally-scoped one that just skips this one test.