adoptium-support icon indicating copy to clipboard operation
adoptium-support copied to clipboard

Fatal error in sun.net.www.MessageHeader.filterAndAddHeaders

Open morvael opened this issue 1 year ago • 27 comments

Please provide a brief summary of the bug

Wildfly application server crashed on some jaxws code doing http request.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00003fff930707e0, pid=11713, tid=14853
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.8.1+1 (17.0.8.1+1) (build 17.0.8.1+1)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (17.0.8.1+1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-ppc64le)
# Problematic frame:
# J 47349 c2 sun.net.www.MessageHeader.filterAndAddHeaders([Ljava/lang/String;Ljava/util/Map;)Ljava/util/Map; [email protected] (334 bytes) @ 0x00003fff930707e0 [0x00003fff9306f500+0x00000000000012e0]

Please provide steps to reproduce where possible

No easy reproducer here. Write an enterprise application making JAX-WS calls and wait 5 years?

Expected Results

No crash.

Actual Results

Crash.

What Java Version are you using?

openjdk version "17.0.8.1" 2023-08-24 OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1) OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (build 17.0.8.1+1, mixed mode, sharing)

What is your operating system and platform?

CentOS ppc64le, kernel 3.10.0-1160.88.1.el7.ppc64le

How did you install Java?

Binary archive.

Did it work before?

There were no such crash dumps before, and code using this has not changed for years.

Did you test with the latest update version?

This is the latest version.

Did you test with other Java versions?

No.

Relevant log output

---------------  T H R E A D  ---------------

Current thread (0x000001001f4e6e70):  JavaThread "EJB default - 2" [_thread_in_Java, id=14853, stack(0x00003fff26e50000,0x00003fff27250000)]

Stack: [0x00003fff26e50000,0x00003fff27250000],  sp=0x00003fff2724b200,  free space=4076k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 47349 c2 sun.net.www.MessageHeader.filterAndAddHeaders([Ljava/lang/String;Ljava/util/Map;)Ljava/util/Map; [email protected] (334 bytes) @ 0x00003fff930707e0 [0x00003fff9306f500+0x00000000000012e0]
J 47347 c2 sun.net.www.protocol.http.HttpURLConnection.getFilteredHeaderFields()Ljava/util/Map; [email protected] (207 bytes) @ 0x00003fff9306bc8c [0x00003fff9306bb00+0x000000000000018c]
J 41782 c1 sun.net.www.protocol.https.HttpsURLConnectionImpl.getHeaderFields()Ljava/util/Map; [email protected] (8 bytes) @ 0x00003fff8bcb0244 [0x00003fff8bcb0000+0x0000000000000244]
J 43018 c1 com.sun.xml.ws.transport.http.client.HttpClientTransport.getHeaders()Ljava/util/Map; (44 bytes) @ 0x00003fff8cf93aa4 [0x00003fff8cf93700+0x00000000000003a4]
J 41638 c1 com.sun.xml.ws.transport.http.client.HttpTransportPipe.recordCookies(Lcom/sun/xml/ws/api/message/Packet;Lcom/sun/xml/ws/transport/http/client/HttpClientTransport;)V (65 bytes) @ 0x00003fff8cfc6f24 [0x00003fff8cfc6b00+0x0000000000000424]
J 45077 c1 com.sun.xml.ws.transport.http.client.HttpTransportPipe.createResponsePacket(Lcom/sun/xml/ws/api/message/Packet;Lcom/sun/xml/ws/transport/http/client/HttpClientTransport;)Lcom/sun/xml/ws/api/message/Packet; (312 bytes) @ 0x00003fff8c9fb0bc [0x00003fff8c9fb000+0x00000000000000bc]
J 42960 c1 com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(Lcom/sun/xml/ws/api/message/Packet;)Lcom/sun/xml/ws/api/message/Packet; (467 bytes) @ 0x00003fff897a3f8c [0x00003fff897a1000+0x0000000000002f8c]
J 42959 c1 com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(Lcom/sun/xml/ws/api/message/Packet;)Lcom/sun/xml/ws/api/pipe/NextAction; (10 bytes) @ 0x00003fff8c21d514 [0x00003fff8c21d480+0x0000000000000094]
J 41634 c1 com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(Lcom/sun/xml/ws/api/message/Packet;)Lcom/sun/xml/ws/api/pipe/NextAction; (166 bytes) @ 0x00003fff8a49dfe4 [0x00003fff8a49d580+0x0000000000000a64]
J 41311 c1 com.sun.xml.ws.api.pipe.Fiber.__doRun(Ljakarta/xml/ws/Holder;Ljava/util/List;)Z (750 bytes) @ 0x00003fff8d1f01bc [0x00003fff8d1ebe00+0x00000000000043bc]
J 41632 c1 com.sun.xml.ws.api.pipe.Fiber._doRun(Lcom/sun/xml/ws/api/pipe/Tube;)Z (548 bytes) @ 0x00003fff8c0b487c [0x00003fff8c0b3800+0x000000000000107c]
J 42953 c1 com.sun.xml.ws.api.pipe.Fiber.doRun()Z (54 bytes) @ 0x00003fff8bf2a620 [0x00003fff8bf29d00+0x0000000000000920]
J 42952 c1 com.sun.xml.ws.api.pipe.Fiber.runSync(Lcom/sun/xml/ws/api/pipe/Tube;Lcom/sun/xml/ws/api/message/Packet;)Lcom/sun/xml/ws/api/message/Packet; (303 bytes) @ 0x00003fff8afcfe60 [0x00003fff8afcf980+0x00000000000004e0]
J 42949 c1 com.sun.xml.ws.client.Stub.process(Lcom/sun/xml/ws/api/message/Packet;Lcom/sun/xml/ws/client/RequestContext;Lcom/sun/xml/ws/client/ResponseContextReceiver;)Lcom/sun/xml/ws/api/message/Packet; (166 bytes) @ 0x00003fff8c7d7988 [0x00003fff8c7d7280+0x0000000000000708]
J 41622 c1 com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(Ljava/lang/Object;[Ljava/lang/Object;Lcom/sun/xml/ws/client/RequestContext;Lcom/sun/xml/ws/client/ResponseContextReceiver;)Ljava/lang/Object; (224 bytes) @ 0x00003fff89299374 [0x00003fff89298f80+0x00000000000003f4]
J 41621 c1 com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (18 bytes) @ 0x00003fff8bcb4288 [0x00003fff8bcb4200+0x0000000000000088]
J 39278 c1 com.sun.xml.ws.client.sei.SEIStub.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (113 bytes) @ 0x00003fff8cf19b30 [0x00003fff8cf19080+0x0000000000000ab0]

morvael avatar Sep 26 '23 11:09 morvael

@morvael - What App Server Jakarta EE stack are you using? Especially version of the JAXP binding?

karianna avatar Sep 28 '23 07:09 karianna

Server: WildFly 27.0.1

Dependencies: JAXB: org.glassfish.jaxb:jaxb-runtime:4.0.3 JAX-WS: com.sun.xml.ws:jaxws-rt:4.0.1

As for JAXP I have only located configuration:

javax.xml.validation.SchemaFactory\:http\://www.w3.org/2001/XMLSchema=com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory
javax.xml.validation.SchemaFactory\:http\://www.w3.org/XML/XMLSchema/v1.0=com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory
javax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
javax.xml.xpath.XPathFactory\:http\://java.sun.com/jaxp/xpath/dom=com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl
javax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
javax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
javax.xml.bind.JAXBContext=org.glassfish.jaxb.runtime.v2.JAXBContextFactory
javax.xml.datatype.DatatypeFactory=com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl

morvael avatar Sep 28 '23 08:09 morvael

Can you attach the full crash log?

karianna avatar Sep 28 '23 08:09 karianna

Sorry had to obfuscate it a little: hs_err_pid11713.log

morvael avatar Sep 28 '23 08:09 morvael

The same issue is happening as well.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  [thread 2246 also had an error]
[thread 2216 also had an error]
[thread 2244 also had an error]
SIGSEGV (0xb) at pc=0x00007ffffe1c98b1, pid=7, tid=2245
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.19+7 (11.0.19+7) (build 11.0.19+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (11.0.19+7, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7f88b1]  G1ParScanThreadState::trim_queue_partially()+0x731
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid7.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues

RanabirChakraborty avatar Sep 30 '23 18:09 RanabirChakraborty

@karianna I can see in the issue section, many of us are facing similar problems. Is there any future plan to resolve this issue? We really need the fix :) I hope you'll understand.

RanabirChakraborty avatar Sep 30 '23 18:09 RanabirChakraborty

This looks like it might be a genuine issue, but if you want speedy investigation and fix then I recommend you contact one of the https://adoptium.net/en-GB/temurin/commercial-support/ vendors for Adoptium.

karianna avatar Oct 02 '23 00:10 karianna

@morvael Has this been reported to the Wildfly folks?

karianna avatar Oct 02 '23 00:10 karianna

I don't think it's their fault - the crash happens in java.base code and there's also JAX-WS Eclipse library & our code between that and WildFly. So I din't report it to the Wildfly folks.

morvael avatar Oct 03 '23 06:10 morvael

The same issue is happening as well.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  [thread 2246 also had an error]
[thread 2216 also had an error]
[thread 2244 also had an error]
SIGSEGV (0xb) at pc=0x00007ffffe1c98b1, pid=7, tid=2245
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.19+7 (11.0.19+7) (build 11.0.19+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (11.0.19+7, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7f88b1]  G1ParScanThreadState::trim_queue_partially()+0x731
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid7.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues

This is a separate issue to what @morvael posted. Can you please open a new issue with your crash log from 11.0.22 or 17.0.10 or 21.0.2

karianna avatar Feb 07 '24 04:02 karianna

@morvael apologies for the repeat here but can you test on 17.0.10 and post the crash log here?

karianna avatar Feb 07 '24 04:02 karianna

I'm afraid this isn't repeatable. must have been triggered by some very specific header fields in content, hasn't happened since. I have no way to replicate :(

morvael avatar Feb 07 '24 07:02 morvael

OK will close for now in that case but please do re-open if it happens again!

karianna avatar Feb 07 '24 08:02 karianna

What a coincidence, looked today and server had the same problem. Sadly we're still on 17.0.9 there, so it doesn't fulfill criteria of happening on 17.0.10.

morvael avatar Feb 07 '24 11:02 morvael

And again on 17.0.10.

morvael avatar Feb 27 '24 07:02 morvael

@karianna crash log from 17.0.10+7: hs_err_pid12381.log

morvael avatar Feb 27 '24 07:02 morvael

Another happened 4 days ago.

morvael avatar Mar 11 '24 09:03 morvael

How do I reopen this?

morvael avatar Mar 11 '24 09:03 morvael

Three crashes in February, two in March. I'd say it became more common. I'll try to enable full logging to see the headers that crash it.

morvael avatar Mar 15 '24 12:03 morvael

App isolated on separate server (I guess the bug will now never reappear :D) and with full http logging.

morvael avatar Mar 21 '24 09:03 morvael

Of course when isolated and with full logging the bug doesn't want to appear. It used to manifest in 1 to 7 days.

morvael avatar Mar 27 '24 07:03 morvael

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. It will be closed soon unless the stale label is removed by a committer, or a new comment is made.

github-actions[bot] avatar Jun 26 '24 00:06 github-actions[bot]

In a sense I found a workaround which was to move one of the applications to another instance of WildFly. I guess it must be something when there are shared core classess between enterprise apps. But the bug wasn't fixed.

morvael avatar Jul 01 '24 06:07 morvael

This is magic. A day after you close the issue the bug strikes again, even in the divided app configuration (first time after 4 months of no bugs). We're soon migrating from those old servers to new ones, so I will let it lie for a while, maybe it won't happen on new servers.

morvael avatar Jul 02 '24 07:07 morvael

I assume this is on 17.0.11 or are you still on 17.0.10?

karianna avatar Jul 02 '24 09:07 karianna

Yes, I have since changed JDK twice. We follow latest release with some obvious delay :)

morvael avatar Jul 02 '24 14:07 morvael

Well, maybe moving from kernel 3.x to 5.x will make it go away. We will see...

morvael avatar Jul 02 '24 14:07 morvael