trino icon indicating copy to clipboard operation
trino copied to clipboard

HTTP ERROR 500 org.eclipse.jetty.io.EofException

Open hsushmitha opened this issue 8 months ago • 6 comments

we are seeing frequent EOFException in Trino:

Error fetching http://<ip>:8080/v1/task/20250401_213932_04859_2vpcz.1.12.3/results/4/0: Expected response code to be 200, but was 500:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 org.eclipse.jetty.io.EofException</title>
</head>
<body>
<h2>HTTP ERROR 500 org.eclipse.jetty.io.EofException</h2>
<table>
<tr><th>URI:</th><td>http://<ip>/v1/task/20250401_213932_04859_2vpcz.1.12.3/results/4/0</td></tr>
<tr><th>STATUS:</th><td>500</td></tr>
<tr><th>MESSAGE:</th><td>org.eclipse.jetty.io.EofException</td></tr>
</table>
<h3>Caused by:</h3><pre>org.eclipse.jetty.io.EofException
	at org.eclipse.jetty.server.internal.HttpChannelState.onClose(HttpChannelState.java:506)
	at org.eclipse.jetty.server.internal.HttpConnection.close(HttpConnection.java:623)
	at org.eclipse.jetty.util.IO.close(IO.java:615)
	at org.eclipse.jetty.util.IO.close(IO.java:630)
	at org.eclipse.jetty.io.ManagedSelector$CloseConnections.update(ManagedSelector.java:1000)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.processUpdates(ManagedSelector.java:577)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:544)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:514)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.MonitoredQueuedThreadPool$1.run(MonitoredQueuedThreadPool.java:73)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
	at java.base/java.lang.Thread.run(Thread.java:1575)
</pre>

</body>
</html>

We are seeing PAGE_TRANSPORT_ERROR as error code Trino version: 472.

hsushmitha avatar Apr 02 '25 09:04 hsushmitha

We also saw this error. We upgraded from 461 -> 474. Before 474 we didn't saw it at all, no config, or queries changes

netanelm-upstream avatar Apr 14 '25 07:04 netanelm-upstream

we downgraded trino from 472 -> 467, we still see this EOFException with jetty error.

hsushmitha avatar Apr 16 '25 06:04 hsushmitha

I was seeing these errors too. I think most of them have gone away by adding this to the config:

internal-communication.http2.enabled=false

I'm seeing other unexplainable 500 errors in Trino 472 though. But the Expected response code to be 200, but was 500 ones have gone away after setting that config.

codyfrisby avatar Apr 29 '25 14:04 codyfrisby

We are seeing similar errors on 469 with internal-communication.http2.enabled=false. The Trino cluster runs fine for a few hours after start up and then a lot of workers end up in a stuck state.

StackTrace

[2025-05-02 18:12:25.862307] 2025-05-02T18:12:25.861Z ERROR async-http-response-5254  org.glassfish.jersey.server.ServerRuntime$Responder Error while closing the output stream in order to commit response.
[2025-05-02 18:12:25.862343] java.io.UncheckedIOException: org.eclipse.jetty.io.EofException[2025-05-02 18:12:25.862359]  at org.glassfish.jersey.message.internal.OutboundMessageContext.close(OutboundMessageContext.java:570)[2025-05-02 18:12:25.862372]  at org.glassfish.jersey.server.ContainerResponse.close(ContainerResponse.java:406)[2025-05-02 18:12:25.862403]  at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:763)[2025-05-02 18:12:25.862415]  at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:398)[2025-05-02 18:12:25.862426]  at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:388)[2025-05-02 18:12:25.862437]  at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:926) [2025-05-02 18:12:25.862447]  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) 
[2025-05-02 18:12:25.862457]  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)[2025-05-02 18:12:25.862467]  at org.glassfish.jersey.internal.Errors.process(Errors.java:292) 
[2025-05-02 18:12:25.862493]  at org.glassfish.jersey.internal.Errors.process(Errors.java:274)[2025-05-02 18:12:25.862507]  at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
[2025-05-02 18:12:25.862518]  at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
[2025-05-02 18:12:25.862529]  at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:963)
[2025-05-02 18:12:25.862543]  at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:914)
[2025-05-02 18:12:25.862555]  at io.airlift.jaxrs.AsyncResponseHandler$1.onSuccess(AsyncResponseHandler.java:99)
[2025-05-02 18:12:25.862566]  at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1139)
[2025-05-02 18:12:25.862585]  at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)[2025-05-02 18:12:25.862599]  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[2025-05-02 18:12:25.862611]  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[2025-05-02 18:12:25.862619]  at java.base/java.lang.Thread.run(Thread.java:1575)
[2025-05-02 18:12:25.862628] Caused by: org.eclipse.jetty.io.EofException
[2025-05-02 18:12:25.862640]  at org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:118)
[2025-05-02 18:12:25.862649]  at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
[2025-05-02 18:12:25.862658]  at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:275)
[2025-05-02 18:12:25.862668]  at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:254)
[2025-05-02 18:12:25.862679]  at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:368)
[2025-05-02 18:12:25.862692]  at org.eclipse.jetty.server.internal.HttpConnection$SendCallback.process(HttpConnection.java:834)
[2025-05-02 18:12:28.440831]  at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:262)[2025-05-02 18:12:28.440843]  at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:243)[2025-05-02 18:12:28.440856]  at org.eclipse.jetty.server.internal.HttpConnection$HttpStreamOverHTTP1.send(HttpConnection.java:1427)[2025-05-02 18:12:28.440869]  at org.eclipse.jetty.server.HttpStream$Wrapper.send(HttpStream.java:185)[2025-05-02 18:12:28.440883]  at org.eclipse.jetty.server.internal.HttpChannelState$ChannelResponse.write(HttpChannelState.java:1300)[2025-05-02 18:12:28.440895]  at org.eclipse.jetty.server.Response$Wrapper.write(Response.java:768)[2025-05-02 18:12:28.440908]  at org.eclipse.jetty.server.handler.EventsHandler$EventsResponse.write(EventsHandler.java:353)[2025-05-02 18:12:28.440920]  at org.eclipse.jetty.server.Response$Wrapper.write(Response.java:768)[2025-05-02 18:12:28.440932]  at org.eclipse.jetty.server.handler.ContextResponse.write(ContextResponse.java:56)[2025-05-02 18:12:28.440946]  at org.eclipse.jetty.ee10.servlet.ServletContextResponse.write(ServletContextResponse.java:288)[2025-05-02 18:12:28.440958]  at org.eclipse.jetty.ee10.servlet.HttpOutput.channelWrite(HttpOutput.java:215)[2025-05-02 18:12:28.440972]  at org.eclipse.jetty.ee10.servlet.HttpOutput.channelWrite(HttpOutput.java:200)[2025-05-02 18:12:28.440984]  at org.eclipse.jetty.ee10.servlet.HttpOutput.write(HttpOutput.java:848)[2025-05-02 18:12:28.440997]  at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:298)[2025-05-02 18:12:28.441010]  at java.base/java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:170)[2025-05-02 18:12:28.441023]  at org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:286)[2025-05-02 18:12:28.441036]  at org.glassfish.jersey.message.internal.CommittingOutputStream.commit(CommittingOutputStream.java:235)
[2025-05-02 18:12:28.441048]  at org.glassfish.jersey.message.internal.CommittingOutputStream.close(CommittingOutputStream.java:250)
[2025-05-02 18:12:28.441060]  at org.glassfish.jersey.message.internal.OutboundMessageContext.close(OutboundMessageContext.java:568)
[2025-05-02 18:12:28.441065]  ... 19 more
[2025-05-02 18:12:28.441074] Caused by: java.io.IOException: Broken pipe
[2025-05-02 18:12:28.441084]  at java.base/sun.nio.ch.SocketDispatcher.$$YJP$$writev0(Native Method)
[2025-05-02 18:12:28.441094]  at java.base/sun.nio.ch.SocketDispatcher.writev0(SocketDispatcher.java)
[2025-05-02 18:12:28.441112]  at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:66)
[2025-05-02 18:12:28.441123]  at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:227)
[2025-05-02 18:12:28.441131]  at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:158)
[2025-05-02 18:12:28.441141]  at java.base/sun.nio.ch.SocketChannelImpl.implWrite(SocketChannelImpl.java:597)
[2025-05-02 18:12:28.441156]  at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:631)
[2025-05-02 18:12:28.441169]  at java.base/java.nio.channels.SocketChannel.write(SocketChannel.java:660)
[2025-05-02 18:12:28.441185]  at org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:112)
[2025-05-02 18:12:28.441192]  ... 43 more
[2025-05-02 18:12:28.441199]
[2025-05-02 18:12:28.441203]

The TCP connections 1000X over time finally leading to FD exhaustion

Image

Majority of the TCP connections are between workers and are all in a CLOSING state (which tracks with the stacktrace above that there seems to be an issue in closing the jetty connections).

lrao-stripe avatar May 02 '25 22:05 lrao-stripe

We further observed that these org.eclipse.jetty.io.EofException with connection close is happening when pods are scaling up when autoscale is enable in trino workers.

hsushmitha avatar May 05 '25 12:05 hsushmitha

same problem, I upgrade Trino version from 472 to 474, if I modify config internal-communication.http2.enabled=true, will this problem be fixed ? thank you

Stongtong avatar Jun 16 '25 02:06 Stongtong

cc: @wendigo

ebyhr avatar Jul 30 '25 23:07 ebyhr

This error has stopped after upgrading trino to 476 version.

hsushmitha avatar Aug 25 '25 14:08 hsushmitha