Daemon
Daemon copied to clipboard
WIP: cl_main: iterate server query list from Com_EventLoop, fix #284
I modified the behavior of the code to iterate the server query list in the main loop to not be blocking, this was an issue that was annoying me and it would have been worst with a floodLimiter and far worst with more servers.
I added a cvar named cl_floodLimiter which is set to 0. That's because on mobile broadband network, sent packets may be lost if we flood them. I've noticed that on Coucou Networks (Edit: Free Mobile) going under 70ms between each server query leads to packets being lost, so people like me can set it to 100 or other safe value like that.
I also added some debug messages.
I now rewrote that code to iterate the server query list in the main loop. So it's not only fixing the issue happening on mobile networks, opening the server list is not blocking the UI (in fact the game) anymore.
I oppose merging the change in its current form. I don't think we should slow down server fetching for everyone on account of a single badly-behaved service provider. Maybe if the delay were 0 by default...
We may set a lower default value, this change has the benefit to give people the ability to adjust for their provider with a simple cvar. Also, it may be good to know that qstat's builtin help say it uses 5ms interval between sending packets, 500ms for server retries, and 2s for master server retries (I haven't checked the code though).
qstat also has an option for maximum simultaneous queries, but I don't know if it's set by default and which value.
Edit: I've checked in code, qstat's maximum number of simultaneous queries is set to 20 by default, and default interval is set to 5ms.
I noticed an issue with my initial code, if cl_floodLimiter was smaller than server ping time, the server would be queried until it responds… This is fixed.
I now set the default cl_floodLimiter value to 0. People like me can tweak it if needed.
I would like to see someone else than me testing this code with cl_floodLimiter set to 0 as I don't have access to non-mobile network myself.
I don't understand how this is supposed to work without cgame changes. When you click "Fetch new list", it expects to be able to rebuild the server list and then rebuild the table within the same frame.
Note: we may investigate later why, at startup, the engine believes the server addresses are bot addresses (bug uncovered by NET_AdrToStringwPort which not only converts address to string but do other checks as well). This is out of topic of this PR, but it deserves some investigation.
@slipher: I don't understand how this is supposed to work without cgame changes. When you click "Fetch new list", it expects to be able to rebuild the server list and then rebuild the table within the same frame.
The game code already knows how to populate the table as the data comes, whatever the frame. That's why this code works just by making queries wait some frames. The game code is already ready for it.
OK, I was wrong about getting the data in the same frame as the request. But looking at CG_Rocket_BuildServerList, I think it does expect to receive all data during 1 frame. It looks like what effectively happens is that it waits until 1 second after requesting the data to gather it. So if the ping took longer than 1 second to arrive, it would not show up.
I would have to see this feature in upcoming release. I want to provide a solution for people having bad network. Note: I had no problem in game while I was facing issues when browsing servers. It proves there is network bad enough to eat the master server packet flood but not the usual game server packets. Also, waiting for all the answers in one frame was pretty bad, about user interaction.
I think it does expect to receive all data during 1 frame.
Why would you think it would be still true after my patches? It's currently true for the name resolution, sadly, but why would it be for received master server packet? For what I see it processes the packets when they come.
I would have to see this feature in upcoming release. I want to provide a solution for people having bad network.
There should be no problem on bad networks in general. I tried getting the server list on a mobile network a minute ago and it worked fine. I also played Tremulous a few times on 3G networks long ago. I think your provider in particular does something stupid which breaks us. Has anyone else ever seen this issue?
The expectation to receive all data at once is in the cgame. See CG_Rocket_BuildServerList. If my understanding is correct, it will be guaranteed not to work if (numservers - 1) * cl_floodLimit > 1000. So you could try setting it to 400, and I would expect only 2 servers to show up.
If I set cl_floodLimiter the four servers are listed after having waited for 12 seconds, i.e. (numServers - 1) * cl_floodLimiter, given the last server answered immediately and there was no reason to wait more because there was no more server to query.
We may improve things by populating the list as soon there is an answer from at least one server (not waiting for all servers to have answered) or even updating existing servers without clearing the list at first, but that would be something else.
Along the problem with bad networks, the UI should not be blocking on network IO. I would really know what may be wrong in this because that works as expected and make the UI less blocking on network IO.
I understand I may miss something important in my code or have overlooked something (for example the 17 length array that cannot be larger than 16 in #1182 seems to work while it's obviously wrong). But even if “networks don't have to be bad“, we have to not block UI on network IO. We need something like this whatever the network is perfect or not.
Also, even with a perfect network, with blocking network IO the engine would freeze for minutes if we would have to browse counter strikes servers. Anyone can do the experiment with qstat to see how behaves a populated server list.
Also it's about fixing a regression, Quake3 was not blocking, neither in querying servers (something I attempt to fix there), neither in populating the server list (something that I don't attempt to fix there, but we would have to do it one day).
For information, 4 people have reported having experienced the issue of the server list being empty or almost empty on LTE link.
- illwieckz (myself)
- bmorel
- superdupont-fr
- BL0CKY_DUDE
- necessarily-equal
Freem reported it some months ago, BL0CKY_DUDE just did it on French chat and lautre confirmed having experienced it as well. BL0CKY_DUDE also said he experiences the issue with other games.
We were all using LTE connection on Free Mobile (one of the 4 LTE ISP in France) at the time we experienced the issue.
Edit: I added necessarily-equal in the list of affected people on 2022-07-26.
Is there some server listing program that works well on these connections (and does not fetch servers at a glacially slow rate)?
There is https://unvanquished.net/servers if player has proper unv:// integration, though it does not work to join a game when the game is already started because of #78.
On linux there is also XQF but it requires some config and also suffers from #78.
My main concern is that people may just think the server list is empty or there is only one server and leave with the idea it's a dead game. We got lucky today, freem talked about his bot server and one guy passing by read him and said he had not seen his server… Then we figured out what was happening, and after asking him some questions, yes he was using LTE link.
I'm interested in whether there's an alternate server browser that works so that we could investigate how it uses the network, to characterize which traffic patterns trigger problems and possibly mimic theirs.
Here I wrote: https://github.com/DaemonEngine/Daemon/pull/283#issuecomment-582256542
Also, it may be good to know that
qstat's builtin help say it uses5msinterval between sending packets,500msfor server retries, and2sfor master server retries (I haven't checked the code though).
qstatalso has an option for maximum simultaneous queries, but I don't know if it's set by default and which value.Edit: I've checked in code,
qstat's maximum number of simultaneous queries is set to20by default, and default interval is set to5ms.
Qstat is there: https://github.com/multiplay/qstat
Qstat is the tool XQF relies on.
I don't remember how well or bad Qstat behaves compared to Unvanquished anyway.
@DolceTriade had a good idea to implement retries with space between the request, rather than spacing the requests from the start. So we could get initially send requests all at once to the servers (like now). And then if they don't respond after 1 second or whatever, send a 2nd (or more?) try, but with time between the retries. Retries would be good to have anyway: I recall it being a common occurrence in Tremulous that I'd have to refresh the list once or twice before the server I was looking for showed up.
While we're at it I'd want to make a different API for getting the server info to get the whole set of updates at once... the current one has these functions that return a tiny amount of data on each call, which is unsuitable for our VM architecture which has some overhead per call.
I'm currently using that mobile connection that eats-up master server query packets as my physical link is down since two days, and my server list is empty… If I'm right both @bmorel and “lautre” (on chat) reproduced the issue with the same provider. It's expected all users of that ISP are affected, hopefully it's a internet mobile provider and this is a desktop game engine so it reduces the chance people can get the specific configuration. But anytime someone in France wants to tether its mobile connection from this ISP (there is only 4 mobile ISP in this country) to play on its laptop/desktop, the server list is empty.
When this happens I have to rely on web list (which suffered others issue like forwarding not working #79 and forwarding of connect not working #608).
If someones has a better implementation this would really be good. The idea of retries is a good one but I don't know how to do it.
So, to make this clear, let me describe the issue I had, without reading all the backlog (since I'm named).
When using a mobile phone as a modem (and not as a router, I used pppd to do the trick), with the "free" provider (free is their name), it was impossible to list servers. I do not remember if it was possible to join a game. It was possible to use the web, and likely IRC (and thus the internet?).
I do not know anything more, but could still do some testing since I have not changed neither provider nor phone. I think a proper issue should be posted, to split the discussion between the problem and the solution, since there's some discussion in there. Splitting the thing in two would help focusing on one topic.
I also have this issue, same network provider (free.fr)
Here is the dedicated issue for the topic:
- https://github.com/DaemonEngine/Daemon/issues/654
@slipher I remember long time ago you asked me to ping you after a release (I don't remember which one) so you can look at this issue. 😅️
We now know two French Mobile Internet Provider to be affected: Free Mobile and Bouygues Telecom, there is only four competitors on the French market so it means half of the French mobile market is affected. It may also affect home connections because providers may ship ADSL+LTE dual-carrier boxes for homes on poorly deserved places, and all French providers are accustomed to provide LTE fallback hardware to home users when their physical link has to be repaired for whatever reasons.
So if you have ideas to for a clean fix or a cleaner workaround, that's highly welcome! 🙂️
I started working on a server list API rewrite a bit before the 0.54 release; I'll pick that up again.
I finally understood what's going on with the server listing code. It already is capable of spreading out ping packets over multiple frames, and has some limitation on the quantity of pings at once: there can't be more than MAX_PINGREQUESTS servers currently waiting for a reply. CL_UpdateVisiblePings_f handles sending out new pings each frame whenever there is spare capacity (like CL_IterateServerQueries in this PR). So the only new variable you would actually need to implement this PR is one to record the time the last ping packet was sent out.
Anyway I will make my own PR soon supporting retries with different delays between packets.
Nice! Thanks for taking care of this! 🙂️
Superseded by #943.