serverlist icon indicating copy to clipboard operation
serverlist copied to clipboard

Persist servers in separate database

Open ShadowNinja opened this issue 3 years ago • 7 comments

Summary of Changes

Persistent SQL Database

Servers are now stored persistently in an SQL database. This offers the following advantages:

  • Servers can now be remembered when they restart, so that things like world age can be verified. (See below)
  • Uptime can now be tracked as a percentage (not currently implemented).
  • The server list can now to maintain non-public information about servers. Previously some values used for things like tracking uptime were included in list.json because the server list didn't have anywhere else to put it. Now it can be stored in the database and not exposed.
  • The list of valid fields are now restricted. Before a server could set arbitrary additional fields and the server list would store and serve then. Multicraft used this to add a server_id field for their server. Only recognize fields will be served to clients now (although server_id has been added to allow this existing use case).
  • This allows the use of multiple server processes. Previously each process would keep its own copy of the server list in memory, which would cause servers to appear and disappear randomly when processes overwrote list.json.

Persistent server state

We need a way to match announcements to servers in the database. The way this used to be done is with the announce_ip:port pair, however this is not a good option going forward because servers can change IPs (e.g. moving to different host or using a dynamic IP).

A better option is address:port because you can maintain the same address when moving the server. However this still not ideal because you can't change the port or move to a different domain name, and without address verification it can be spoofed.

The solution is to add a world_uuid. This is stored with the world and allows the world to be moved freely and still identified. This value is kept secret to prevent spoofing of the server's announcements.

An alternative possibility would be to use a pubkey/privkey pair to identify servers, but this would be more complicated.

Fallback

address:port is used as a fallback for old servers, which should also be unspoofable with address verification enabled.

Address verification is needed to prevent spoofing with servers that do not support world_uuid, however it can fail if a server is set up for IPv4 and announces over IPv6. Last I checked there were 71/315 servers with this issue, so it's not something we just want to ignore. Therefore, Address verification has been moved into the view function and a detailed error response is returned if it fails. The response includes detailed instructions on how to fix the issue. Address verification is skipped if world_uuid is sent.

There was also one server that (minetest.zarnica.org.ua:30001) that fails the verification check for a different reason: it uses IPv4, but announces over a different IPv4 address than it listens on. Perhaps this server has multiple NICs. It doesn't seems like this is a major concern since it's just one server with an odd networking config.

Background tasks

Celery is now used to handle background tasks. This requires a message broker (Redis or RabbitMQ) to be available.

We can probably use something simpler if this is deemed too complicated. We can probably move the entire request processing into the view function to remove the need for the request processing background task. This would also allow up to serve more useful error messages if there's an issue in the final verification steps before publishing the server.

The maintenance background task could be moved to a simpler scheduler like APScheduler, or even just cron with a cli command. However, it will have to run in a separate process so that there aren't multiple schedulers if there are multiple web processes.

List changes

The following fields are no longer included in the server list:

  • ip: The IP that the announce was received from. This is only stored on the server now.
  • clients: This field was overwritten with len(clients_list) unless the server was old enough to not send a clients list. All servers on the server list currently send clients_list.
  • start: Was used to validate uptime. Kept internally now. Clients can use uptime to approximate this.
  • total_clients: Sum of clients in announce requests. Was to calculate pop_v.
  • update_time: Kept internally now.
  • updates: Number of updates received. Used to calculate pop_v. Popularity is calculated without this now.
  • liquid_finite: Removed in 0.4.10.

Also server_id was officially added. Multicraft clients set this to "multicraft". It worked before because the server list allowed arbitrary extra fields.

The list is not immediately updated when a new server announces any more, the server may not show up for up to a minute.

Ranking changes

  • "Guest" accounts are no longer down-weighted.
  • "Heavily loaded" is now limited by a server-side constant instead of relying on clients_max.
  • Points for server age increase by 2 per month instead of 1 per month. Max age boost is still 8.
  • High clients_max is no longer penalized, as it no longer meaningfully contributes to a higher ranking.
  • Ping time threshold lowered to 300ms. All pings below 300ms are treated the same to avoid penalizing a server simply for its geographical distance from the server list. Above that a penalty is applied.
  • Recent startup penalty changed from -8 to -4.

The method used to calculate popularity (pop_v) has changed. It is now rolling average (like /proc/loadavg). This is needed because with persistent server storage the more recent popularity has to be weighted more heavily so that an old server that gets more popular isn't stuck with its old low popularity.

Ping checking

Servers are pinged a few times and the lowest value is used. Pings are also retried a few times in case of packet loss.

The server list re-pings every server every minute when the list is updated, rather that just sending one ping on start and never updating it until the server restarts. This could be tweaked if that's too often.

Misc

  • There is a command to import the existing JSON server list into the new database (flask load-json list.json).
  • Integers and booleans as strings no longer supported. This was supported for compatibility with old clients (<0.4.10) and should not be an issue anymore.
  • In the initial announce all IPs that the server's address resolves to are pinged instead of just the first one. This ensures that the domain doesn't have an invalid DNS record that may cause clients to fail to connect.
  • Use Pipfile instead of requirements.txt.
  • Split server list into separate files. This is necessary to keep everything organized now.
  • Changed casing to snake_case.
  • README expanded.

ShadowNinja avatar Jun 13 '21 20:06 ShadowNinja

Last I checked there were 71/315 servers with this issue, so it's not something we just want to ignore. Therefore, Address verification has been moved into the view function and a detailed error response is returned if it fails.

But we can't just put the stricter validation into effect soon. Server owners cannot be properly notified until they have upgraded to 5.5.0, which will take some time even after it is released.

The server list re-pings every server every minute when the list is updated, rather that just sending one ping on start and never updating it until the server restarts.

I'm not convinced we need to update the ping between the 5 minutes a server usually announces.

Ranking changes

These sound mostly fine to me but it'd be better to move these into a future PR so that this one doesn't get tangled in discussions about ranking changes.

Celery is now used to handle background tasks. This requires a message broker (Redis or RabbitMQ) to be available. We can probably use something simpler if this is deemed too complicated. We can probably move the entire request processing into the view function to remove the need for the request processing background task.

Not sure if I like Celery yet, it's probably fine. Though if it's done simpler (just start a background thread from Flask?) I wouldn't mind. Moving processing into the requests isn't a good idea and not just for the obvious reasons. Minetest will currently(?) abort cURL requests after 5 seconds which might be too tight to do all the work.

Use Pipfile instead of requirements.txt.

pip can still read these and they not specific to Pipenv, are they?

sfan5 avatar Jul 10 '21 13:07 sfan5

ContentDB uses Celery and SQLAlchemy - I recommend both. The problem with a background thread is when you have multiple web workers, you can end up potentially duplicating work

Also, I would suggest as an alternative to generating list.json to instead make it a Flask endpoint. You can then cache in Nginx or Apache. This feels cleaner

rubenwardy avatar Jul 10 '21 18:07 rubenwardy

We can't just put the stricter validation into effect soon. Server owners cannot be properly notified until they have upgraded to 5.5.0, which will take some time even after it is released.

I have an idea to solve this: We could try to verify the server and if it succeeds set an address_verification_required bit in the database. This way servers with a compatible configuration still get spoof protection.

I'm not convinced we need to update the ping between the 5 minutes a server usually announces.

I could bump this to 5 minutes.

[Ranking changes] sound mostly fine to me but it'd be better to move these into a future PR so that this one doesn't get tangled in discussions about ranking changes.

OK. The popularity change will still be needed, but the rest can be separated.

pip can still read [Pipfiles] and they not specific to Pipenv, are they?

No, you need pipenv. I could also include a requirements.txt.

ShadowNinja avatar Jul 10 '21 20:07 ShadowNinja

I have an idea to solve this: We could try to verify the server and if it succeeds set an address_verification_required bit in the database. This way servers with a compatible configuration still get spoof protection.

Good idea. Only side effect (that could result in more support questions) is that announces are no longer deterministic, if you pass domain verification once you may never fail it again.

I could bump this to 5 minutes.

Rather just remove that code entirely then? Servers are pinged on announce and those happen every 5 minutes.

sfan5 avatar Jul 11 '21 09:07 sfan5

Only side effect (that could result in more support questions) is that announces are no longer deterministic, if you pass domain verification once you may never fail it again.

Yes, not sure if there's much we can really do about this. It seems unlikely that a broken config would get fixed and then break again, so hopefully this won't be an issue in practice.

Servers are pinged on announce and those happen every 5 minutes.

Good point. I've removed the periodic server-initiated ping.

ShadowNinja avatar Aug 06 '21 00:08 ShadowNinja

Can you rebase this? I want to put this to the test by mirroring all of the requests to it.

sfan5 avatar Feb 20 '22 19:02 sfan5

@sfan5 Rebased.

ShadowNinja avatar Oct 17 '23 01:10 ShadowNinja