neqo Redo #1784

@KershawChang said in #1800:

It appears that PR #1784 caused some test failures (failed tests on windows). After reverting it, all http3 tests are passed.

We should find out why those treeherder tests failed, and write a neqo unittest that replicates that failure. And then take another stab at #1784.

Apr 09 '24 05:04 larseggert

Sorry for the trouble @kershawchang.

Do I understand correctly, that the failing tests all use https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs?

Apr 09 '24 07:04 mxinden

Do I understand correctly, that the failing tests all use https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs?

Yes, that's correct. Note that those tests are only failed on windows platform, so I assume this could be a timing issue, which makes it not easy to debug. Or it could be the fault of our flaky test server that needs to be modified.

Apr 09 '24 07:04 KershawChang

If I understand correctly the GitHub HTTP3 server implementation (neqo-bin/server/mod.rs) and the mozilla-central HTTP3 server implementation have a lot in common, e.g.:

wrapper logic around neqo-http3 driving neqo_http3::Http3Server
UDP I/O logic

What is the long term plan for the two? Should they simply co-exist as they do today? Should they merge? Should they share components?

Apr 12 '24 08:04 mxinden

Good topic to discuss on the next call. I don't have history of why the current state is as it is, but I'd personally like to reduce duplication as much as possible, and ideally have most of the logic in the neqo repo where we can more easily update and test it.

(We should probably add a CI step to build Fx with PRs that propose changes to neqo, and ideally also execute some Fx-level tests.)

Apr 15 '24 05:04 larseggert

What is the long term plan for the two? Should they simply co-exist as they do today? Should they merge? Should they share components?

It's probably not easy to merge the two, but sharing some components and reducing the duplicate code definitely makes sense to me.

Apr 15 '24 07:04 KershawChang

Good topic to discuss on the next call.

Added to the agenda document :heavy_check_mark:.

Apr 15 '24 08:04 mxinden

I’ve investigated this issue and believe it is caused by a design flaw in our test http3server. Specifically, with the changes introduced in #1784, the Http3Server::process method sometimes returns Output::None. This behavior is problematic for the http3server, which depends on the process_datagrams_and_events method being called periodically. However, process_datagrams_and_events is only triggered when a timer expires or a packet is received. This could lead to stalls.

I think the right fix would be to refactor the http3server such that process_datagrams_and_events is called when there is data to send.

Apr 22 '24 14:04 KershawChang

With https://github.com/mozilla/neqo/pull/1878 merged and https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 closed, this issue can be closed as well. Would you mind @larseggert?

May 27 '24 10:05 mxinden