Redo #1784
@KershawChang said in #1800:
It appears that PR #1784 caused some test failures (failed tests on windows). After reverting it, all http3 tests are passed.
We should find out why those treeherder tests failed, and write a neqo unittest that replicates that failure. And then take another stab at #1784.
Sorry for the trouble @kershawchang.
Do I understand correctly, that the failing tests all use https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs?
Do I understand correctly, that the failing tests all use https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs?
Yes, that's correct. Note that those tests are only failed on windows platform, so I assume this could be a timing issue, which makes it not easy to debug. Or it could be the fault of our flaky test server that needs to be modified.
If I understand correctly the GitHub HTTP3 server implementation (neqo-bin/server/mod.rs) and the mozilla-central HTTP3 server implementation have a lot in common, e.g.:
- wrapper logic around
neqo-http3drivingneqo_http3::Http3Server - UDP I/O logic
What is the long term plan for the two? Should they simply co-exist as they do today? Should they merge? Should they share components?
Good topic to discuss on the next call. I don't have history of why the current state is as it is, but I'd personally like to reduce duplication as much as possible, and ideally have most of the logic in the neqo repo where we can more easily update and test it.
(We should probably add a CI step to build Fx with PRs that propose changes to neqo, and ideally also execute some Fx-level tests.)
What is the long term plan for the two? Should they simply co-exist as they do today? Should they merge? Should they share components?
It's probably not easy to merge the two, but sharing some components and reducing the duplicate code definitely makes sense to me.
Good topic to discuss on the next call.
Added to the agenda document :heavy_check_mark:.
I’ve investigated this issue and believe it is caused by a design flaw in our test http3server. Specifically, with the changes introduced in #1784, the Http3Server::process method sometimes returns Output::None. This behavior is problematic for the http3server, which depends on the process_datagrams_and_events method being called periodically. However, process_datagrams_and_events is only triggered when a timer expires or a packet is received. This could lead to stalls.
I think the right fix would be to refactor the http3server such that process_datagrams_and_events is called when there is data to send.
With https://github.com/mozilla/neqo/pull/1878 merged and https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 closed, this issue can be closed as well. Would you mind @larseggert?