github-workflows-kt icon indicating copy to clipboard operation
github-workflows-kt copied to clipboard

[Bug] Binding server sometimes fails to properly respond

Open Vampire opened this issue 7 months ago • 7 comments

Component

  • [ ] github-workflows-kt (library with the DSL)
  • [x] bindings server (https://bindings.krzeminski.it)
  • [ ] I don't know

Action

for i in {01..100}; do echo $i; rm -rf ~/.m2/repository/actions/checkout*; .github/workflows/build.main.kts; done

Expected

No failures happening.

Actual

Occasionally download fails and retry mechanics have to kick in. For example

[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - I/O exception (org.jetbrains.kotlin.org.apache.http.NoHttpResponseException) caught when processing request to {s}->https://bindings.krzeminski.it:443: The target server failed to respond
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - Retrying request to {s}->https://bindings.krzeminski.it:443

on 5th and 24th try. Or

[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - I/O exception (java.net.SocketException) caught when processing request to {s}->https://bindings.krzeminski.it:443: Connection reset
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - Retrying request to {s}->https://bindings.krzeminski.it:443

on 12th try.

Vampire avatar May 21 '25 12:05 Vampire

I failed to reproduce it with a local server.

krzema12 avatar May 22 '25 06:05 krzema12

When running the script mentioned in the issue's description for more iterations, I don't see anything significantly off in the metrics. This one is going to be tricky.

Ideas for the next steps: have logs and/or metrics to catch a case where a request for an artifact is received, but ktor doesn't respond. Strawman: have two metrics, one increased when a request is received, and one increased when a response is sent. We should ideally see them both at the same values. If they diverge, it means we managed to capture the problem in the metrics, and it's a step forward since we'll be able to quantify the problem.

krzema12 avatar May 22 '25 06:05 krzema12

Another idea: make ktor fail if no response is sent. I'm discussing with the community what's the best way to achieve it: https://kotlinlang.slack.com/archives/C0A974TJ9/p1747947460471269?thread_ts=1747947460.471269&cid=C0A974TJ9

krzema12 avatar May 22 '25 21:05 krzema12

We have a concrete next step advised by the ktor support: https://kotlinlang.slack.com/archives/C0A974TJ9/p1747988699947299?thread_ts=1747947460.471269&cid=C0A974TJ9 I think we need to tweak log4j2.xml to set logging level for Netty to DEBUG somehow.

krzema12 avatar May 23 '25 08:05 krzema12

Let me add this to the logging config:

    <Loggers>
        ...
        <Logger name="io.netty" level="debug">
            <AppenderRef ref="Console Appender"/>
            <AppenderRef ref="Rolling File Appender"/>
        </Logger>
    </Loggers>

krzema12 avatar May 28 '25 07:05 krzema12

Just <Logger name="io.netty" level="DEBUG"/> is enough, the appenders are inherited

Vampire avatar May 28 '25 07:05 Vampire

Or with TRACE or ALL if it logs below DEBUG and DEBUG is not enough.

Vampire avatar May 28 '25 07:05 Vampire

I haven't seen this issue for a while. @Vampire @LeoColman have you?

krzema12 avatar Sep 27 '25 21:09 krzema12

Nope

LeoColman avatar Sep 27 '25 21:09 LeoColman

Also doesn't reproduce with the command from the original report. All 100 iterations are successful. Let me resolve it for now, and we can always revisit if it happens again

krzema12 avatar Oct 09 '25 18:10 krzema12