jdk
jdk copied to clipboard
8295159: DSO created with -ffast-math breaks Java floating-point arithmetic
A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics.
The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522
One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now.
However, this fix is not complete. dlopen()
is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions.
Progress
- [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8295159: DSO created with -ffast-math breaks Java floating-point arithmetic
Reviewers
- Magnus Ihse Bursie (@magicus - Reviewer) ⚠️ Review applies to af243d47
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661
$ git checkout pull/10661
Update a local copy of the PR:
$ git checkout pull/10661
$ git pull https://git.openjdk.org/jdk pull/10661/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 10661
View PR using the GUI difftool:
$ git pr show -t 10661
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10661.diff
:wave: Welcome back aph! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@theRealAph The following labels will be automatically applied to this pull request:
-
build
-
hotspot
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how.
Webrevs
- 19: Full - Incremental (b1412626)
- 18: Full (80ce877b)
- 17: Full - Incremental (53388f97)
- 16: Full - Incremental (30a1381d)
- 15: Full - Incremental (bd51efde)
- 14: Full - Incremental (2ca6f8c4)
- 13: Full - Incremental (c554746b)
- 12: Full (393a4384)
- 11: Full - Incremental (91eb9bb7)
- 10: Full - Incremental (7cba08d0)
- 09: Full - Incremental (a3305e5c)
- 08: Full - Incremental (01f6e224)
- 07: Full (c56adbd9)
- 06: Full - Incremental (64ef36f3)
- 05: Full - Incremental (3f442beb)
- 04: Full - Incremental (bcff4597)
- 03: Full - Incremental (fe388b8b)
- 02: Full - Incremental (d5655d4e)
- 01: Full - Incremental (8d834739)
- 00: Full (af243d47)
It seems strange to me that the native library part is here:
test/hotspot/jtreg/runtime/jni/TestDenormalFloat/libfast-math.c
and the two test files are here:
test/hotspot/jtreg/compiler/floatingpoint/TestDenormalDouble.java test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java
And the two tests don't have "@run main/native"... Maybe I'm missing something about what you're trying to test here.
This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this? Inserting a workaround now seems rather late. Any FP-using native code can potentially break Java's FP semantics if it messes with the FPU control word (ref the old Borland compilers).
Shouldn't any workaround only be needed for the internals of System.loadLibrary
as other JDK usages of dlopen
should know what they are opening and that they are libraries that don't have this problem?
Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how.
@theRealAph We currently have a more or less 1-to-1 relationship between OS and compiler. From a portability perspective this is not ideal, but it is also hard to keep some kind of theoretical barrier when it never is tested.
Are you worried that this is a problem that occurs with gcc on other platforms as well, or that it occurs on linux with other compilers than gcc?
We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..?
@theRealAph This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8295159: DSO created with -ffast-math breaks Java floating-point arithmetic
Reviewed-by: ihse, dholmes, stuefe
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 8 new commits pushed to the master
branch:
- 1a21c1a783d64ca0930c358c06a43975f96ffac6: 8318736: com/sun/jdi/JdwpOnThrowTest.java failed with "transport error 202: bind failed: Address already in use"
- 81db1721d4fac954003441ca2c3c29b0998d310d: 8318955: Add ReleaseIntArrayElements in Java_sun_awt_X11_XlibWrapper_SetBitmapShape XlbWrapper.c to early return
- be01caf30de5a70684fab25b02f2c6ab346a60cb: 8319323: FFM: Harmonize the @throws tags in the javadocs
- ec79ab4b3cd89c2c0a9c8550cd62433bd6d45266: 8319268: Build failure with GCC8.3.1 after 8313643
- c788160f8acea7b58b54ad857b601bb7ffb53f8e: 8296240: Augment discussion of test tiers in doc/testing.md
- ffaecd4aa23ca15e765784858da82b632c72dfc2: 8315364: Assert thread state invariant for JFR stack trace capture
- 3b65b8797a0798474947d38d3facd17b3e89c602: 8316028: Update FreeType to 2.13.2
- 9dc40ba48ec15d5775537c4c2224ee5bb563629a: 8319195: Move most tier 1 vector API regression tests to tier 3
Please see this link for an up-to-date comparison between the source branch of this pull request and the master
branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master
branch, type /integrate in a new comment.
It seems strange to me that the native library part is here:
test/hotspot/jtreg/runtime/jni/TestDenormalFloat/libfast-math.c
and the two test files are here:
test/hotspot/jtreg/compiler/floatingpoint/TestDenormalDouble.java test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java
I already moved it.
And the two tests don't have "@run main/native"... Maybe I'm missing something about what you're trying to test here.
Will fix.
This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this?
That's what is so insidious about messing with the floating-point control word: all that happens is people get slightly inaccurate results, which they don't notice.
Inserting a workaround now seems rather late. Any FP-using native code can potentially break Java's FP semantics if it messes with the FPU control word (ref the old Borland compilers).
Indeed, but there seem to be a number of libraries out there compiled with -ffast-math. This is the kind of thing people do with highly performance-critical stuff like ray tracing, etc. But we don't want it to break Java.
Shouldn't any workaround only be needed for the internals of
System.loadLibrary
as other JDK usages ofdlopen
should know what they are opening and that they are libraries that don't have this problem?
Maybe, but sometimes we use system- or user-provided librries.
Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how.
@theRealAph We currently have a more or less 1-to-1 relationship between OS and compiler. From a portability perspective this is not ideal, but it is also hard to keep some kind of theoretical barrier when it never is tested.
I see.
Are you worried that this is a problem that occurs with gcc on other platforms as well, or that it occurs on linux with other compilers than gcc?
The former.
We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..?
Probably, yes. I'm happy just to fix it on Linux.
We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..?
Indeed. The problem is that we have no idea what compiler was used to compile the libraries we call, so I'd like to be conservative. The increase in runtime due to saving and restoring the floating-point environment is very low, given that dlopen()
is a pretty expensive operation.
So you want to provide this dlopen wrapper for other platforms as well?
So you want to provide this dlopen wrapper for other platforms as well?
I'm not sure. It sounds to me like a Linux-only patch would do it, but I think BSD uses GCC sometimes. (Open/FreeBSD is downstream of OpenJDK, and I'd love to get it into mainline, but getting BSD contributors to sign OCA is hard.)
This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this?
That's what is so insidious about messing with the floating-point control word: all that happens is people get slightly inaccurate results, which they don't notice.
e.g. https://github.com/gevent/gevent/pull/1820 is for Python (SciPy) and adversely affects time to converge. I have no idea if other Java users have similar problems, but it's a bug that I'd like to go away.
Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time?
We already have similar problems with MXCSR register and provide verification logic (part of -Xcheck:jni
) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when -Xcheck:jni
is enabled.
Should we do something similar here instead?
For upcalls on non-Windows platforms, we also save MXCSR and restore it after the call, and load a set standard value for the Java code that's about to be executed. (I'm not sure why this isn't done on Windows, tbh)
Relevant code for JNI is here:
- Downcalls: https://github.com/openjdk/jdk/blob/1961e81e02e707cd0c8241aa3af6ddabf7668589/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5002
- Upcalls: https://github.com/openjdk/jdk/blob/1961e81e02e707cd0c8241aa3af6ddabf7668589/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L262
FPCSR is only handled on non _LP64 it looks like.
I agree with Vladimir that this seems like a general problem of foreign code potentially messing with control bits (in theory, foreign code could violate its ABI in other ways as well). It seems that both major C/C++ x64 ABIs (1, 2, 3) treat the control bits as non-volatile, so the callee should preserve them. This is in theory a choice of a particular ABI, but I think in general we can assume foreign code does not modify the control bits. Though, we never know for sure of course, and I suppose this is where RestoreMXCSROnJNICalls
comes in. There's no equivalent flag for FPCSR atm AFAICS, so the answer there seems to be "just don't do it".
Following that logic: from our perspective dlopen
violates its ABI in certain cases. Preserving the control bits across calls to dlopen
seems like a pragmatic solution. I'm not sure how important it is to have an opt-in for the current (broken) behavior...
Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time?
We already have similar problems with MXCSR register and provide verification logic (part of
-Xcheck:jni
) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when-Xcheck:jni
is enabled.Should we do something similar here instead?
I tend to agree. As others have observed, a dlopen
call (or something with same nefarious behavior) could also happen inside JNI code. But I think the interesting (and perhaps surprising) part here is that, from the perspective of the developer, no native code has executed - only a library has been loaded (via System::loadLibrary
).
Note also that this specific problem is triggered by dlopen
itself, because certain libraries might have some "bad" (from the perspective of JVM) initialization code. But since we're talking about JNI, JNI_OnLoad is another potential source of problem, as its native code is executed as soon as the library is loaded (and that, too, can leave the JVM in a bad state).
Note also that this specific problem is triggered by dlopen itself, because certain libraries might have some "bad" (from the perspective of JVM) initialization code.
... and it leads to another question whether the JVM itself is amenable to such problems. Do we need to sanitize the environment when returning from calls into the JVM?
Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time?
Yes.
We already have similar problems with MXCSR register and provide verification logic (part of
-Xcheck:jni
) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when-Xcheck:jni
is enabled.Should we do something similar here instead?
The problem is that this bug is very insidious: the user probably doesn't know that there's anything wrong, and almost certainly has no idea that it's anything to do with JNI.
Saving and restoring the floating-point environment across dlopen() is a compromise between adding extra code at JNI callouts, which can be expensive, and shifting the burden to the user.
Following that logic: from our perspective
dlopen
violates its ABI in certain cases. Preserving the control bits across calls todlopen
seems like a pragmatic solution. I'm not sure how important it is to have an opt-in for the current (broken) behavior...
Not at all, I'd have thought.
Here's another possibility: save the FP environment over up- and down-calls, and restore it if the control bits have changed. We can do that in a portable way, overriding it in a more performant way with arch-specific code.
The problem is that this bug is very insidious: the user probably doesn't know that there's anything wrong, and almost certainly has no idea that it's anything to do with JNI.
I'm still trying to grasp why the current problem is something different from what we experienced before.
Some platforms (x86_32 and AArch32) provide a way to restore FP environment, but it is turned off by default. Why x86_64 case is different, so it requires the logic to be turned on by default? Does the same reasoning apply to those platforms as well and we want to have AlwaysRestoreFPU
turned on by default?
$ grep -r AlwaysRestoreFPU src/hotspot/
src/hotspot//cpu/x86/templateInterpreterGenerator_x86.cpp: if (AlwaysRestoreFPU) {
src/hotspot//cpu/x86/sharedRuntime_x86_32.cpp: if (AlwaysRestoreFPU) {
src/hotspot//cpu/arm/sharedRuntime_arm.cpp: if (AlwaysRestoreFPU) {
src/hotspot//cpu/arm/templateInterpreterGenerator_arm.cpp: if (AlwaysRestoreFPU) {
src/hotspot//share/runtime/globals.hpp: product(bool, AlwaysRestoreFPU, false, \
Mailing list message from Andrew Haley on build-dev:
On 10/13/22 19:13, Vladimir Ivanov wrote:
On Thu, 13 Oct 2022 07:47:20 GMT, Andrew Haley <aph at openjdk.org> wrote:
The problem is that this bug is very insidious: the user probably doesn't know that there's anything wrong, and almost certainly has no idea that it's anything to do with JNI.
I'm still trying to grasp why the current problem is something different from what we experienced before.
I don't know anything about what we experienced before. In any case, I'm not sure that precedent applies here: the problem is that there are shared libraries in use which break IEEE semantics. This is an insidious bug, and my proposed fix costs almost nothing.
Some platforms (x86_32 and AArch32) provide a way to restore FP environment, but it is turned off by default. Why x86_64 case is different, so it requires the logic to be turned on by default? Does the same reasoning apply to those platforms as well and we want to have `AlwaysRestoreFPU` turned on by default?
My patch is not in any way x86_64-specific. At present it is Linux-specific, but I have my doubts that it should be.
x86-32 has a very specific problem: its default floating-point precision is AFAIK extended float, which does not meet Java specs. Also, restoring the FPU after every JNI call is expensive.
-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
FTR I did an exercise in source code archeology and here are my findings.
The origin of AlwaysRestoreFPU
-related code (both in x86-32 and arm-specific code) can be traced back to JDK-6487931 and JDK-6550813. Though both issues manifested as JVM crashes, the underlying problem was identified as FPU control word corruption by native code.
The regression test does trigger the corruption from a JNI call (using either _FPU_SETCW
[1] or _controlfp
[2]), but it was deliberately limited to x86-32.
Based on that, I conclude that the problem with FP environment corruption by native code was known before, but an opt-in solution was chosen.
(Frankly speaking, I don't know why an opt-in solution was considered sufficient. Maybe because it was erroneously believed it can only lead to a crash at runtime?)
Considering we are now aware about insidious nature of the problem (silent result corruption), I'm inclined to propose either to turn on AlwaysRestoreFPU
by default (and provide implementation on platforms where it is missed) or, at least, catch FPU control word corruption on native->java transitions and crash the JVM advertising -XX:+AlwaysRestoreFPU
as a solution.
[1] https://man7.org/linux/man-pages/man3/__setfpucw.3.html [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=msvc-170
Correction: I was under the false impression the FP-state was shared at the process level but I've been told it is per-thread. So the scope of the problem is much narrower.
The pre-existing "fixes" in this area were an opt-in band-aid while people migrated away from using problematic native libraries, or else got them fixed.
For the benefit of the mailing list my previous comment has been edited/corrected:
Correction: I was under the false impression the FP-state was shared at the process level but I've been told it is per-thread. So the scope of the problem is much narrower.
Some additional info to consider: the library modifies only MXCSR register and -XX:+RestoreMXCSROnJNICalls
makes the test to pass.
$ objdump -D libfast-math.so
...
0000000000000560 <set_fast_math>:
560: f3 0f 1e fa endbr64
564: 0f ae 5c 24 fc stmxcsr -0x4(%rsp)
569: 81 4c 24 fc 40 80 00 orl $0x8040,-0x4(%rsp)
570: 00
571: 0f ae 54 24 fc ldmxcsr -0x4(%rsp)
576: c3 ret
On non-windows x86-64 the JVM saves/restores (and adjusts if needed) MXCSR inside call stub as mandated by the ABI: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L262-L273
x87 is not used in x86-64-specific code anymore, so x87-specific part of FP environment (FPSR and FPCS registers) is not relevant to JVM anymore there.
So, IMO the discussion boils down to how we want a misbehaving native library to be handled by the JVM.
The ABI lists MXCSR as a callee-saved register, so there's nothing wrong on JVM side from that perspective.
From a quality of implementation perspective though, JVM could do a better job at catching broken libraries. Of course, there are numerous ways for a native code to break the JVM, but in this particular case, it looks trivial to catch the problem. The question is how much overhead we can afford to introduce for that. Whether it should be an opt-in solution (e.g., -Xcheck:jni
or -XX:+AlwaysRestoreFPU
/-XX:+RestoreMXCSROnJNICalls
), opt-out (unconditionally recover or report an error when FP env is corrupted, optionally providing a way to turn it off), or apply a band-aid fix just to fix the immediate problem with GCC's fast-math mode.
I'd like to dissuade from going with just a band-aid fix (we already went through that multiple times with different level of success) and try to improve the overall experience JVM provides. It feels like just pushing the problem further away and it would be very unfortunate to repeat the very same exercise in the future.
My preferred solution would be to automatically detect the corruption and restore MXCSR register across a JNI call, but if it turns out to be too expensive, JVM could check for MXCSR register corruption after every JNI call and crash issuing a message with diagnostic details about where corruption happened (info about library and entry) offering to turn on -XX:+AlwaysRestoreFPU
/-XX:+RestoreMXCSROnJNICalls
as a stop-the-gap solution. It would send users a clear signal there's something wrong with their code/environment, but still giving them an option to workaround the problem while fixing the issue.
Saying that, I'd like to stress that I'm perfectly fine with addressing the general issue of misbehaving native libraries separately (if we agree it's worth it) and I trust @dholmes-ora and @theRealAph to choose the most appropriate fix for this particular bug.