quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

java.net.SocketException in SHUTDOWN phase while AWS Lambda (Quarkus + GraalVM) is connected with AwsAppConfigExtension

Open DudekJakub opened this issue 3 months ago • 11 comments

Describe the bug

Hello everybody.

I'm struggling with AWS Lambda working on:

  • run with: Quarkus (3.26.2)
  • built with: GraalVM Oracle (21.0.7)
  • architecture: arm64
  • runtime: Amazon Linux 2
  • handler: io.quarkus.amazon.lambda.runtime.QuarkusStreamHandler::handleRequest

with pinned extension layer:

  • name: AWS-AppConfig-Extension-Arm64
  • version: 132
  • compatible-architectures: arm64

This is extension for AWS-AppConfig. Lambda itself does not have any code related to AppConfig connection etc. It is just clean build with following dependencies:

  • implementation("io.quarkus:quarkus-amazon-lambda")
  • implementation("software.amazon.awssdk:url-connection-client")

and the simplest requestHandler class:

import com.amazonaws.services.lambda.runtime.Context
import com.amazonaws.services.lambda.runtime.RequestHandler
import com.amazonaws.services.lambda.runtime.events.DynamodbEvent
import com.test.logging.Logging

class Lambda : RequestHandler<DynamodbEvent, String> {
  private val logger = Logging.logger { }

  override fun handleRequest(
    event: DynamodbEvent,
    context: Context
  ): String {
    logger.info { "Everything OK" }
    return "OK"
  }
}

Whenever the lambda is being invoked everything runs normal until the SHUTDOWN phase where exception "java.net.SocketException" occurs:

{
    "@timestamp": "2025-09-08T15:28:52.825Z",
    "log.level": "ERROR",
    "process.thread.name": "Lambda Thread (NORMAL)",
    "error.stack_trace": "java.net.SocketException: Socket closed\n\tat [email protected]/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243)\n\tat [email protected]/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323)\n\tat [email protected]/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)\n\tat [email protected]/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)\n\tat [email protected]/java.net.Socket$SocketInputStream.read(Socket.java:1099)\n\tat [email protected]/java.io.BufferedInputStream.fill(BufferedInputStream.java:291)\n\tat [email protected]/java.io.BufferedInputStream.read1(BufferedInputStream.java:347)\n\tat [email protected]/java.io.BufferedInputStream.implRead(BufferedInputStream.java:420)\n\tat [email protected]/java.io.BufferedInputStream.read(BufferedInputStream.java:399)\n\tat [email protected]/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:827)\n\tat [email protected]/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:759)\n\tat [email protected]/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1706)\n\tat [email protected]/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615)\n\tat [email protected]/sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:3251)\n\tat io.quarkus.amazon.lambda.runtime.AbstractLambdaPollLoop$1.run(AbstractLambdaPollLoop.java:95)\n\tat [email protected]/java.lang.Thread.runWith(Thread.java:1596)\n\tat [email protected]/java.lang.Thread.run(Thread.java:1583)\n\tat org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:896)\n\tat org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:872)\n",
    "error.type": "java.net.SocketException",
    "error.message": "Socket closed",
    "message": "Error running lambda (NORMAL) [Error Occurred After Shutdown]",
    "ecs.version": "1.12.1"
}

Expected behavior

The poll loop should quietly terminate during container shutdown without logging SocketException: Socket closed as an ERROR.

Actual behavior

With the AWS AppConfig Extension attached, the poll loop logs SocketException: Socket closed in the Quarkus stack trace during container shutdown.

Below I attached 2 screenshots where I picture logs from CloudWatch of Lambda's whole lifecycle:

Image

and then after some minutes of container being IDLE... it goes to SHUTDOWN phase:

Image

Black boxes just hide sensitive data of my company but these logs are completely unrelated.

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

21.0.7

Mandrel or GraalVM version (if different from Java)

GraalVM Oracle 21.0.7

Quarkus version or git rev

3.26.2

Build tool (ie. output of mvnw --version or gradlew --version)

8.14

Additional information

If I remove the extension layer then the exception does not occur.

The issue occurs only in SHUTDOWN phase of the whole lambda's lifecycle:

Image

and it does not interrupt anyhow earlier phases. Whole logic of phases INIT and multiple INVOKES runs correctly.

Apparently something is happening when io.quarkus.amazon.lambda.runtime.AbstractLambdaPollLoop tries to reach AppConfigAgent (the layer) asynchronously while the extension itself is being shutdown - both happen basically in the same moment (you can check timestamps). This is my only guess.

Could you please support me somehow with this issue? Even if INIT and INVOKE phases are not interrupted by the issue and after all the container shutdowns correctly... I'd like to understand why is it even happening and how to avoid it? Maybe it should be ignored as harmless behavior but I'm still quite concerned.

DudekJakub avatar Sep 09 '25 07:09 DudekJakub

/cc @Karm (native-image), @galderz (native-image), @patriot1burke (amazon-lambda), @radcortez (config), @zakkak (native-image)

quarkus-bot[bot] avatar Sep 09 '25 07:09 quarkus-bot[bot]

@DudekJakub I see you are using GraalVM, does this issue appear when you compile your application to native or in JVM-mode as well?

zakkak avatar Sep 09 '25 07:09 zakkak

@zakkak we deploy AWS Lambdas with GraalVM, Dockers with JVM so we do not have comparison.

adampoplawski avatar Sep 09 '25 08:09 adampoplawski

Are you building a native executable?

zakkak avatar Sep 09 '25 08:09 zakkak

Yes. We build ./gradlew build -Dquarkus.native.enabled=true and use function.zip as an artifact.

adampoplawski avatar Sep 09 '25 08:09 adampoplawski

We have the same issue, but it doesn't seem to be specific to the AwsAppConfigExtension layer.

In our case, we also build native executables for Lambda, it's the OpenTelemetry Collector Lambda layer.

dagrammy avatar Sep 22 '25 11:09 dagrammy

We are experiencing the same problem. It also happens with our native compiled lambdas when using and otel collector reporting to a lambda layer.

Harriebo avatar Sep 23 '25 12:09 Harriebo

Does anyone have a reproducer with OTel collector that I can use to see the problem in action?

geoand avatar Oct 21 '25 07:10 geoand

Closing for a lack of a reproducer. If one becomes available, we can certainly reexamine the issue

geoand avatar Nov 14 '25 07:11 geoand

@geoand what kind of reproducer do you expect? Sth that is failing on cloud (but you do not have test in project or load test config) is enough?

adampoplawski avatar Nov 17 '25 13:11 adampoplawski

Hi @geoand and others,

I was able to create a simple reproducer project using the AWS lambda tutorial with an AWS AppConfig layer.

The reproducer is available here: https://github.com/dagrammy/lambda-layer-reproducer and the readme describes how to build the project and deploy it to AWS.

One important thing I noticed during development, which is also mentioned in the readme file, is that I was only able to reproduce the error after increasing the Lambda memory. With 256M, the error could not be reproduced but with 1GB, for example.

My guess is that the application/Lambda shuts down faster than the layer when the Lambda has more memory. 1GB: lambda-layer-reproducer stopped in 0.002s 256MB: lambda-layer-reproducer stopped in 0.016s

Hope this reproducer helps :)

dagrammy avatar Nov 17 '25 15:11 dagrammy