s2n-quic icon indicating copy to clipboard operation
s2n-quic copied to clipboard

s2n-quic server fails resumption interop test with neqo client

Open goatgoose opened this issue 10 months ago • 5 comments

Problem:

The resumption interop test has recently started failing with the neqo client and s2n-quic server. neqo has released a fix related to this issue (https://github.com/mozilla/neqo/pull/1837), but the interop test is still failing with neqo and s2n-quic.

Solution:

We should investigate the cause of this test failure. If the issue can be addressed in s2n-quic, we should resolve it and revert https://github.com/aws/s2n-quic/pull/2191 to enforce the resumption test with neqo and s2n-quic in CI.

goatgoose avatar Apr 26 '24 20:04 goatgoose

Sorry for the trouble here @goatgoose.

https://github.com/mozilla/neqo/pull/1837 fixed the issue.

The resumption testcase using neqo client s2n-quic server is no longer failing. See e.g. recent CI run:

https://github.com/mozilla/neqo/pull/1857#issuecomment-2077536804

The neqo-qns Docker image is published nightly, thus reverting https://github.com/aws/s2n-quic/pull/2191 should succeed now.

mxinden avatar Apr 29 '24 10:04 mxinden

Hi @mxinden, it appears that even after the https://github.com/mozilla/neqo/pull/1837 fix, the neqo client and s2n-quic server still fail the resumption test. From https://github.com/mozilla/neqo/pull/1857#issuecomment-2077536804:

Failed Interop Tests
neqo-latest vs. s2n-quic: R A

However, looking at the interop runner, it seems like https://github.com/mozilla/neqo/pull/1837 fixed the issue for all implementations except for s2n-quic: resumption_interop

So we plan to investigate this to see if s2n-quic is causing this issue.

goatgoose avatar Apr 29 '24 14:04 goatgoose

I'm looking at the download of the second URL in https://interop.seemann.io/logs/2024-07-25T16:32/s2n-quic_neqo/resumption/output.txt.

One thing that looks odd from neqo's perspective is that we're receving HandshakeDone from the server many times after ACK'ing it in our packet 3.

Otherwise, I can't tell from the log why we wouldn't send a ConnectionClose. Is there any chance you can make a linux/arm64 docker image available? I unfortunately can't run amd64 locally.

larseggert avatar Jul 26 '24 16:07 larseggert

One thing that looks odd from neqo's perspective is that we're receving HandshakeDone from the server many times after ACK'ing it in our packet 3.

This is expected. s2n-quic sends the HandshakeDone very aggressively (with every outgoing packet) until it has received acknowledgement it was received, so as to ensure the client is not blocked on the handshake.

Is there any chance you can make a linux/arm64 docker image available? I unfortunately can't run amd64 locally.

I'm currently having some trouble building on arm64 so this might take some time.

WesleyRosenblum avatar Jul 29 '24 20:07 WesleyRosenblum

This was a neqo bug: https://github.com/mozilla/neqo/pull/2067

larseggert avatar Aug 21 '24 13:08 larseggert