corretto-11
corretto-11 copied to clipboard
Garbage Collector G1 failes with EXCEPTION_ACCESS_VIOLATION (0xc0000005)
Garbage Collector G1 fails with EXCEPTION_ACCESS_VIOLATION (0xc0000005)
How to reproduce: It is reproducible when you are trying to create a lot of small short cycle living objects. We are facing it during syntax analysis.
Additional Details: This is reproducible on Windows 10 very often, much rare for Linux. Using OpenJDK11GA it is hardly reproduciblle. Also when G1 GC used it required 10-15% more RAM than ParallelGC
System details: openjdk version "11.0.7" 2020-04-14 LTS OpenJDK Runtime Environment Corretto-11.0.7.10.1 (build 11.0.7+10-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.7.10.1 (build 11.0.7+10-LTS, mixed mode)
NOTE: ParallelGC garbage collector works fine and if you will reduce memory it will fail with OOM exception, but when G1 used - it will always fail with EXCEPTION_ACCESS_VIOLATION
Suggestion: Plese set ParallelGC as default GC in Java 11 Corretto.
Please see error log in attach. A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00007ff994bad16f, pid=7402, tid=7403
JRE version: OpenJDK Runtime Environment (11.0.7+10) (build 11.0.7+10-LTS) Java VM: OpenJDK 64-Bit Server VM (11.0.7+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) Problematic frame: J 19716 c2 java.util.HashMap.get(Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007ff994bad16f [0x00007ff994baca00+0x000000000000076f] hs_err_pid1488.log
Thank you for reporting this issue.
Could you provide a sample code we could use to reproduce this issue?.
Looking at the hs_err file, I found the following:
The exception happens at PC 0x0000021e29ea8420
:
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000021e29ea8420, pid=1488, tid=9492
while reading address 0x00000000200ffea8
:
siginfo: EXCEPTION_ACCESS_VIOLATION (0xc0000005), reading address 0x00000000200ffea8
Looking at the "Register to memory mapping" we can see that RBP is a compressed oop which is pointing into a valid class in the classspace:
RBP=537919132=0x200ffe9c is a compressed pointer to class: 0x00000001007ff4e0
org.antlr.v4.runtime.atn.ATNConfig {0x00000001007ff4e8}
and we see that faulting address 0x00000000200ffea8
corresponds to "RBP + 0xc"
When dissassembling the instructions at 0x0000021e29ea8420
we get:
0x0000021e29ea8420: 44 8b 45 0c mov r8d,DWORD PTR [rbp+0xc]
which confirms that we are loading from "RBP + 0xc".
Further down in the assembly we find:
0x0000021e29ea84c3: 44 8b 4d 18 mov r9d,DWORD PTR [rbp+0x18]
0x0000021e29ea84c7: 41 85 02 test DWORD PTR [r10],eax
0x0000021e29ea84ca: 45 85 c9 test r9d,r9d
0x0000021e29ea84cd: 0f 84 a3 eb ff ff je 0x21e29ea7076
0x0000021e29ea84d3: 49 8b e9 mov rbp,r9
0x0000021e29ea84d6: e9 45 ff ff ff jmp 0x21e29ea8420
What happens here is that we load from RBP+0x18, and if the result is not zero, store it back in RBP. After that we jump back to 0x0000021e29ea8420
where we want to load from "RBP + 0xc" but crash because RBP is a compressed class pointer.
So the problem is that we try loading from a compressed class pointer in C2-compiled code without "decompressing" the pointer. Why this happens is not clear to me yet.
@9sokolov could you ideally provide the My.jar
file you've used to reproduce the problem? This would greatly help us in finding the root cause of the problem. If that's not possible, could you please try to reproduce the problem with "-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and send us that output together with the exact version of org/antlr/v4/runtime/atn/ATNConfig
used by your application. But as mentioned before, getting My.jar
would be really preferred :)
We've provided all jar separately. Hope it will help.
Is there any progress on fixing this issue?
I have observed it while verifying if https://github.com/aws/aws-iot-device-sdk-java-v2/issues/134 has already been fixed.
Is there a known workaround (i.e. correct configuration for ParallelGC)?
I have tried enforcing ParallelGC in the IntelliJ config Java class invocation:
D:\Tools\amazon-corretto-11.0.11.9.1-windows-x64-jdk\jdk11.0.11_9\bin\java.exe -XX:+UseParallelGC -javaagent:D:\Tools\ideaIU-211.6556.6.win\lib\idea_rt.jar=57523:D:\Tools\ideaIU-211.6556.6.win\bin -Dfile.encoding=UTF-8 -classpath D:\git\krzwil\mqtt-check-java-v2\target\classes;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\iotdevicesdk\aws-iot-device-sdk\1.3.1\aws-iot-device-sdk-1.3.1.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\crt\aws-crt\0.12.4\aws-crt-0.12.4.jar;C:\Users\krzwil\.m2\repository\com\google\code\gson\gson\2.8.5\gson-2.8.5.jar;C:\Users\krzwil\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.12.0\jackson-core-2.12.0.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\auth\2.16.58\auth-2.16.58.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\annotations\2.16.58\annotations-2.16.58.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\utils\2.16.58\utils-2.16.58.jar;C:\Users\krzwil\.m2\repository\org\reactivestreams\reactive-streams\1.0.3\reactive-streams-1.0.3.jar;C:\Users\krzwil\.m2\repository\org\slf4j\slf4j-api\1.7.30\slf4j-api-1.7.30.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\sdk-core\2.16.58\sdk-core-2.16.58.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\metrics-spi\2.16.58\metrics-spi-2.16.58.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\regions\2.16.58\regions-2.16.58.jar;C:\Users\krzwil\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.12.3\jackson-annotations-2.12.3.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\profiles\2.16.58\profiles-2.16.58.jar;C:\Users\krzwil\.m2\repository\software\amazon\awssdk\http-client-spi\2.16.58\http-client-spi-2.16.58.jar;C:\Users\krzwil\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.12.1\jackson-databind-2.12.1.jar;C:\Users\krzwil\.m2\repository\software\amazon\eventstream\eventstream\1.0.1\eventstream-1.0.1.jar MqttCheck
but the same ERROR_ACCESS_VIOLATION has been detected.
This issue has also been encountered in corretto-17#28. Closing that issue as duplicate for now and tracking here.