triplea RFC: Lobby Version switch using NGINX

Proposal:

Start deploying all lobby versions to the one lobby server as a '2.5', '2.6', '2.7', '3.0', etc.. and keep them running until we decide to turn off the older ones
Have every every release number server use a 'beta' DB (eg: 2.6, 2.8, 3.0, 3.2), and odd numbers use the prod Db (eg: 2.5, 2.7, 3.1)
When we mark a game client as a 'release', we then increment the release version number thereby triggering a new lobby and new latest clients to send a new version number to be routed to this new lobby.

Notable changes:

beta (AKA unstable) lobby would be hosted on same server as the prod lobby
we would remove the prerelease linode server (savings of $5/month)

Notable drawbacks:

only have one prod nginx; single point of failure and mis-configurations there would impact everything

Diagram

Here is what the the system would look like (all on one box): lobby-version-switch-using-nginx

Other notes:

we would want to update the client code that if it gets no response to prompt the user to download the latest version
version out of date check should be updated to use the github API to find the latest release version

Comments

the server load on production is pretty low, we could have quite a few lobbies running there and create a new DB schema on the same server

Mar 31 '22 05:03 DanVanAtta

@DanVanAtta I think the proposed change sounds good, however I'm not sure how you're planning the distribution mechanism. IIRC you were thinking about using headers to do the logic, but to be honest that just sounds like a custom version of virtual hosts.

For example making a request to 3-4.lobby.triplea-game.org automatically sets the Host header to the domain, making the configuration pretty trivial.

Pros of this approach

Simple to implement for clients, nginx config is also straightforward
Potentially allows to distribute versions onto several servers if we ever start noticing the load is too heavy without too much effort
- If you're completely crazy you could also think about a domain setup like *.3.lobby.triplea-game.org and *.2.lobby.triplea-game.org and so on to potentially distribute load for several major versions, but that's most likely overkill even though it's a possibility

Cons

Potentially more complicated setup for certificates and DNS entries. Namecheap and letsencrypt should support wildcard domains, but it's still some one-time extra effort

Apr 02 '22 11:04 RoiEXLab

We are more trying to solve request routing (or arguably service discovery) more so than we are trying to achieve virtual hosts.

Routing based on headers is already in place and note we could easily redirect requests to arbitrary hosts by templating the 'localhost' portion of the redirect destinations:

https://github.com/triplea-game/triplea/blob/master/infrastructure/ansible/roles/nginx/defaults/main.yml#L18
https://github.com/triplea-game/triplea/blob/master/infrastructure/ansible/roles/nginx/templates/etc_nginx_sites_enabled_default.j2#L36

Previously routing was done entirely client side by first parsing an index file: https://github.com/triplea-game/triplea/blob/master/servers.yml, running the current client version through a switch and then selecting the right host for future requests based on that switch result.

This routing logic is moving to server-side (ie: NGINX config) and how we do that routing in NGINX can be done in multiple ways.

DNS Based Routing

We could have multiple DNS entires referring to the same (NGINX) routing machine. NGINX could either have multiple server blocks or could have a switch statement using the host header. Note, that the switch statement based on a host header, or a version header is almost the same thing.

Though, there are some significant costs to DNS based routing. (1) We require many DNS entries and have to manage these (more on wildcarding later). DNS has limited access and this makes the problem of "I go on vacation and the TripleA project comes to a complete adn full stop" worse. (2) Testing of this configuration requires hacking of a developer machines DNS

The net effect is it is more difficult for others to work on new lobby versions.

(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries (4) DNS is slow to change

Version Header Based Routing

On the other hand, we could have just one DNS entry and the 'version header' be the item that we use in the NGINX switch block when determining the correct final host to service that request.

Pros of DNS based routing vs Header based routing

Simple to implement for clients, nginx config is also straightforward

The header based routing is almost the same regardless which choice we make. Having multiple server blocks is arguably more complex (but more flexible), either way this advantage is nearly the same regardless of our choice.

Potentially allows to distribute versions onto several servers if we ever start noticing the load is too heavy without too much effort

This is the same with either approach. We can route requests to arbitrary servers and it's pretty easy to add a load balancer config to NGINX (which would be the same regardless of whether we switch on the host header or a version header). Though, load balancing is not what we are trying to achieve. It turns out we cannot do load balancing because lobby instances must be single instance.

Server cardinality & request routing, benefits

What we want is one lobby instance for each client version. Wildcard DNS entires would not work because 2.6 clients should be routed to the 2.6 lobby, and 2.7 clients routed to the 2.7 lobby, etc.. It would not be the case that any 2.x client goes to the 2.x lobby. (If we did wildcarding, it's a question as well how do we split 'beta' traffic from the 'prod' versions, but this question is a bit moot as there is a misunderstanding of lobby instance cardinality).

Instead, by having the release versions be 'pinned' between client and lobby, we never have to worry about backward compatibility. This is a main benefit. We simply leave the older instances running, when they start seeing close to zero traffic, we turn them off. To emphasize, this means we never have to worry about a 2.5 client working with a 2.7 lobby, we can completely change the APIs in a 2.7 and not worry at all about previous client versions. The only backward compatibility concern we have would be in database.

Another big benefit is this configuration is completely programmatic, no manual configuration and can be done entirely via pull request to modify the 'configuration-as-code'.

Last, clients would only need one stable DNS name forever, for any version. The DNS setup becomes a one-time operation, and if we want to migrate everything to a new server stack, it is just one DNS entry to update.

Apr 03 '22 20:04 DanVanAtta

(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries

This is only partially true, there's the concept of Wildcard DNS entries where all *.lobby.triplea-game.org hostnames are redirected to the same server.

Apr 04 '22 00:04 RoiEXLab

@RoiEXLab in such a case we have a single wildcard domain - if we have clients sending requests to URLs like 2-6.lobby.triplea-game.org, and then we have an 'if/else' statement that switches on the host header, is that really much different from having clients sending the version in a header? From a systems perspective, both are switching based on a header value, both require the client to inject the version somewhere - I think the biggest differences would be in local development and test configuration.

Different aspect to this topic, I'm wondering if we even really need a 'beta' DB?

Apr 04 '22 02:04 DanVanAtta

@DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.

Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.

Both approaches have their pros and cons, but in the end they're almost identical

Apr 04 '22 03:04 RoiEXLab

If reinventing the wheel, which tools are there that do what we want?

Virtual hosts is many DNS to one host, we are wanting the inverse (so this is a lot more akin to routing and load balancing)

On Sun, Apr 3, 2022, 8:04 PM RoiEX @.***> wrote:

@DanVanAtta https://github.com/DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.

Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.

Both approaches have their pros and cons, but in the end they're almost identical

— Reply to this email directly, view it on GitHub https://github.com/triplea-game/triplea/issues/10252#issuecomment-1087059529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6SZOOBR5SYH3QNQ2PHH2DVDJL2LANCNFSM5SEFP65A . You are receiving this because you were mentioned.Message ID: @.***>

Apr 04 '22 19:04 DanVanAtta

@RoiEXLab also what are your thoughts on the necessity of a preprod database?

CC: @tvleavitt , @bacrossland , y'all might be interested on this topic, feedback is welcome; particularly regarding any ways we can do this in a more simple manner and any potential pitfalls that you could foresee.

Apr 05 '22 00:04 DanVanAtta

If reinventing the wheel, which tools are there that do what we want?

I was mainly thinking about nginx's server blocks. I was imagining a nginx config along the lines of this:

server {
  include common.conf;
  server_name 2-4.lobby.triplea-game.org;

  location / {
    proxy_pass http://localhost:1337;
  }
}
# repeat for every instance with different port and server_name

However I just read this article and learned that the map directive exists making this pretty convenient regardless. So it doesn't really matter what kind of header is used. A custom header could be used (like they do in the article) or alternatively the host header could also be used. So it's just a preference if you want to make it seem like there are many servers from the outside or if you want to hide them completely behind a single reverse proxy.

One thing that would also be possible in theory is to have some sort of versioning already built into the URL path. So https://lobby.triplea-game.org/2.4/* would actually forward to http://localhost:8080/* and for other versions respectively. But I'm sure this approach has it's own bunch of problems because now the root of the URL is no longer fixed, so I assume that's why it wasn't considered. Nginx is designed to achieve just this and using this approach we could just chain location blocks with proxy_pass directives and get a simple but expressive config. Just wanted to mention it for completeness.

Apr 05 '22 01:04 RoiEXLab

Regarding a preprod database: I think there are some scenarios where it can be useful. This includes being able to find database migration issues that slipped through testing due to a small test set, as well as the possibility to have a playground to identify and debug reported issues that are not reproducable locally (maybe there's an issue that only occurs if the server and clients timezones differ). So it can be useful if there's actually data in the preprod database, but the better testing, code reviews and QA are the less useful it becomes I think.

Apr 05 '22 01:04 RoiEXLab

I think we are pretty safe ground in terms of 'reinventing the wheel'. The 'wheel' in this case is using NGINX to do request routing using headers. The overall system design & configuration though is the thing we need to build.

Re: version in URL

A rule of thumb I've adopted is to avoid versions in URLs. Essentially a URL is the name of a resource, and it is an extremely long lived part of that entity. Using the URL for something that has a far shorter life span, less than the resource, creates a mismatch.

FWIW, there are a number of discussions/links supporting this perspective:

https://stackoverflow.com/questions/972226/how-to-version-rest-uris
https://sookocheff.com/post/api/how-to-version-a-rest-api/
https://medium.com/@XenoSnowFox/youre-thinking-about-api-versioning-in-the-wrong-way-6c656c1c516b

I listened to a talk (that I am still trying to find) that described, IIRC, both that version values should go into headers and then be routed, and second to pin these versions to older and running instances thereby avoiding the backward compatibility scenario.

Version based header routing, a plus for local development

Version based header routing does make local testing quite feasible, we just spin up a docker with nginx and the needed config file for routing & then we send requests to localhost and can verify they get routed correctly. We are currently able to set up every other component locally, implying we would still be able to simulate the full stack on a local developer machine.

Preprod - to be or not to be?

Some reasons against a preprod

When it was dead, we did not really care. This is evidence it is not super useful. EG: preprod was dead for many months and only towards the last few months was there any kind of mounting concern.
The data in preprod is about as well 'seeded' as the data we seed for a local docker database. Ideally we would improve it to be a really good data set, nonetheless there is test data parity between a local docker & preprod.
The local docker database gives a good sandbox. It's easy to wipe & recreate data. In the example of time zone issues, that can be readily done locally by changing client clock or the docker clock.
Preprod broke a lot, it's extra maintenance, it's extra servers; overall things are simpler without it.

Risks of no Preprod Database

Seemingly these are pretty minimal since I don't think we did very much with the previous preprod database. The biggest risk I see is we make some sort of change that injects data that is bad that a previous (and running) lobby version cannot deal with. I cannot think of any examples o the preprod database was the thing that prevented a data problem. Generally it is the DB testing and local simulation that find any potential issues.

No Lobby Preprod is a non-issue

The fact that the lobby software can have a newer version running, is basically a preprod, it's only that we are interacting with prod data. This would make feature preview probably much better, and if we are careful about how we insert & update data, then arguably it's a far better environment for finding problems.

Apr 05 '22 05:04 DanVanAtta

Latest Proposed System Diagram if there is no preprod

lobby-version-switch-using-nginx

Apr 05 '22 05:04 DanVanAtta

My vote is for the versioning through headers. It's not only easier for local development, it's easier to scale when running in production. If you pin the version to a DNS entry and the IP of the destination is updated, you have to wait for that DNS record change to roll through the internet (based on TTL and syncing of DNS servers) before all clients are routing to the proper location. That problem doesn't happen when routing by header option. Once the updated nginx config is deployed, routing happens immediately.

Having the version in the path of the url is similar to passing it as a header option but locks you into maintaining that pathing in future releases. If a change is made from a path of /2.4/ to /pre-release/2.4/ then redirects have to be maintained to ensure the first path is routed to the second path so older clients still work. Those redirects return a 302 response code which then opens up the question of how does that older client react to getting a 302 on a request. Was it only expecting 200? Does it know to follow the redirect? Passing the version as a header option avoids all of that.

Sep 09 '22 03:09 bacrossland

triplea triplea copied to clipboard

RFC: Lobby Version switch using NGINX

Proposal:

Notable changes:

Notable drawbacks:

Diagram

Other notes:

Comments

Pros of this approach

Cons

We are more trying to solve request routing (or arguably service discovery) more so than we are trying to achieve virtual hosts.

DNS Based Routing

Version Header Based Routing

Pros of DNS based routing vs Header based routing

Server cardinality & request routing, benefits

Re: version in URL

Version based header routing, a plus for local development

Preprod - to be or not to be?

Some reasons against a preprod

Risks of no Preprod Database

No Lobby Preprod is a non-issue

Latest Proposed System Diagram if there is no preprod

triplea
triplea copied to clipboard