[Bug] Binding server sometimes fails to properly respond
Component
- [ ] github-workflows-kt (library with the DSL)
- [x] bindings server (https://bindings.krzeminski.it)
- [ ] I don't know
Action
for i in {01..100}; do echo $i; rm -rf ~/.m2/repository/actions/checkout*; .github/workflows/build.main.kts; done
Expected
No failures happening.
Actual
Occasionally download fails and retry mechanics have to kick in. For example
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - I/O exception (org.jetbrains.kotlin.org.apache.http.NoHttpResponseException) caught when processing request to {s}->https://bindings.krzeminski.it:443: The target server failed to respond
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - Retrying request to {s}->https://bindings.krzeminski.it:443
on 5th and 24th try. Or
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - I/O exception (java.net.SocketException) caught when processing request to {s}->https://bindings.krzeminski.it:443: Connection reset
[main] INFO org.jetbrains.kotlin.org.apache.http.impl.execchain.RetryExec - Retrying request to {s}->https://bindings.krzeminski.it:443
on 12th try.
I failed to reproduce it with a local server.
When running the script mentioned in the issue's description for more iterations, I don't see anything significantly off in the metrics. This one is going to be tricky.
Ideas for the next steps: have logs and/or metrics to catch a case where a request for an artifact is received, but ktor doesn't respond. Strawman: have two metrics, one increased when a request is received, and one increased when a response is sent. We should ideally see them both at the same values. If they diverge, it means we managed to capture the problem in the metrics, and it's a step forward since we'll be able to quantify the problem.
Another idea: make ktor fail if no response is sent. I'm discussing with the community what's the best way to achieve it: https://kotlinlang.slack.com/archives/C0A974TJ9/p1747947460471269?thread_ts=1747947460.471269&cid=C0A974TJ9
We have a concrete next step advised by the ktor support: https://kotlinlang.slack.com/archives/C0A974TJ9/p1747988699947299?thread_ts=1747947460.471269&cid=C0A974TJ9 I think we need to tweak log4j2.xml to set logging level for Netty to DEBUG somehow.
Let me add this to the logging config:
<Loggers>
...
<Logger name="io.netty" level="debug">
<AppenderRef ref="Console Appender"/>
<AppenderRef ref="Rolling File Appender"/>
</Logger>
</Loggers>
Just <Logger name="io.netty" level="DEBUG"/> is enough, the appenders are inherited
Or with TRACE or ALL if it logs below DEBUG and DEBUG is not enough.
I haven't seen this issue for a while. @Vampire @LeoColman have you?
Nope
Also doesn't reproduce with the command from the original report. All 100 iterations are successful. Let me resolve it for now, and we can always revisit if it happens again