triplea
triplea copied to clipboard
RFC: Lobby Version switch using NGINX
Proposal:
- Start deploying all lobby versions to the one lobby server as a '2.5', '2.6', '2.7', '3.0', etc.. and keep them running until we decide to turn off the older ones
- Have every every release number server use a 'beta' DB (eg: 2.6, 2.8, 3.0, 3.2), and odd numbers use the prod Db (eg: 2.5, 2.7, 3.1)
- When we mark a game client as a 'release', we then increment the release version number thereby triggering a new lobby and new latest clients to send a new version number to be routed to this new lobby.
Notable changes:
- beta (AKA unstable) lobby would be hosted on same server as the prod lobby
- we would remove the prerelease linode server (savings of $5/month)
Notable drawbacks:
- only have one prod nginx; single point of failure and mis-configurations there would impact everything
Diagram
Here is what the the system would look like (all on one box):
Other notes:
- we would want to update the client code that if it gets no response to prompt the user to download the latest version
- version out of date check should be updated to use the github API to find the latest release version
Comments
- the server load on production is pretty low, we could have quite a few lobbies running there and create a new DB schema on the same server
@DanVanAtta I think the proposed change sounds good, however I'm not sure how you're planning the distribution mechanism. IIRC you were thinking about using headers to do the logic, but to be honest that just sounds like a custom version of virtual hosts.
For example making a request to 3-4.lobby.triplea-game.org
automatically sets the Host
header to the domain, making the configuration pretty trivial.
Pros of this approach
- Simple to implement for clients, nginx config is also straightforward
- Potentially allows to distribute versions onto several servers if we ever start noticing the load is too heavy without too much effort
- If you're completely crazy you could also think about a domain setup like
*.3.lobby.triplea-game.org
and*.2.lobby.triplea-game.org
and so on to potentially distribute load for several major versions, but that's most likely overkill even though it's a possibility
- If you're completely crazy you could also think about a domain setup like
Cons
- Potentially more complicated setup for certificates and DNS entries. Namecheap and letsencrypt should support wildcard domains, but it's still some one-time extra effort
We are more trying to solve request routing (or arguably service discovery) more so than we are trying to achieve virtual hosts.
Routing based on headers is already in place and note we could easily redirect requests to arbitrary hosts by templating the 'localhost' portion of the redirect destinations:
- https://github.com/triplea-game/triplea/blob/master/infrastructure/ansible/roles/nginx/defaults/main.yml#L18
- https://github.com/triplea-game/triplea/blob/master/infrastructure/ansible/roles/nginx/templates/etc_nginx_sites_enabled_default.j2#L36
Previously routing was done entirely client side by first parsing an index file: https://github.com/triplea-game/triplea/blob/master/servers.yml, running the current client version through a switch and then selecting the right host for future requests based on that switch result.
This routing logic is moving to server-side (ie: NGINX config) and how we do that routing in NGINX can be done in multiple ways.
DNS Based Routing
We could have multiple DNS entires referring to the same (NGINX) routing machine. NGINX could either have multiple server blocks or could have a switch statement using the host
header. Note, that the switch statement based on a host
header, or a version
header is almost the same thing.
Though, there are some significant costs to DNS based routing. (1) We require many DNS entries and have to manage these (more on wildcarding later). DNS has limited access and this makes the problem of "I go on vacation and the TripleA project comes to a complete adn full stop" worse. (2) Testing of this configuration requires hacking of a developer machines DNS
The net effect is it is more difficult for others to work on new lobby versions.
(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries (4) DNS is slow to change
Version Header Based Routing
On the other hand, we could have just one DNS entry and the 'version header' be the item that we use in the NGINX switch block when determining the correct final host to service that request.
Pros of DNS based routing vs Header based routing
Simple to implement for clients, nginx config is also straightforward
The header based routing is almost the same regardless which choice we make. Having multiple server blocks is arguably more complex (but more flexible), either way this advantage is nearly the same regardless of our choice.
Potentially allows to distribute versions onto several servers if we ever start noticing the load is too heavy without too much effort
This is the same with either approach. We can route requests to arbitrary servers and it's pretty easy to add a load balancer config to NGINX (which would be the same regardless of whether we switch on the host
header or a version
header). Though, load balancing is not what we are trying to achieve. It turns out we cannot do load balancing because lobby instances must be single instance.
Server cardinality & request routing, benefits
What we want is one lobby instance for each client version. Wildcard DNS entires would not work because 2.6 clients should be routed to the 2.6 lobby, and 2.7 clients routed to the 2.7 lobby, etc.. It would not be the case that any 2.x client goes to the 2.x lobby. (If we did wildcarding, it's a question as well how do we split 'beta' traffic from the 'prod' versions, but this question is a bit moot as there is a misunderstanding of lobby instance cardinality).
Instead, by having the release versions be 'pinned' between client and lobby, we never have to worry about backward compatibility. This is a main benefit. We simply leave the older instances running, when they start seeing close to zero traffic, we turn them off. To emphasize, this means we never have to worry about a 2.5 client working with a 2.7 lobby, we can completely change the APIs in a 2.7 and not worry at all about previous client versions. The only backward compatibility concern we have would be in database.
Another big benefit is this configuration is completely programmatic, no manual configuration and can be done entirely via pull request to modify the 'configuration-as-code'.
Last, clients would only need one stable DNS name forever, for any version. The DNS setup becomes a one-time operation, and if we want to migrate everything to a new server stack, it is just one DNS entry to update.
(3) Deployment is not fully programmatic, we would require manual steps to set up new DNS entires and equally manual steps to remove old DNS entries
This is only partially true, there's the concept of Wildcard DNS entries where all *.lobby.triplea-game.org
hostnames are redirected to the same server.
@RoiEXLab in such a case we have a single wildcard domain - if we have clients sending requests to URLs like 2-6.lobby.triplea-game.org
, and then we have an 'if/else' statement that switches on the host
header, is that really much different from having clients sending the version in a header? From a systems perspective, both are switching based on a header value, both require the client to inject the version somewhere - I think the biggest differences would be in local development and test configuration.
Different aspect to this topic, I'm wondering if we even really need a 'beta' DB?
@DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.
Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.
Both approaches have their pros and cons, but in the end they're almost identical
If reinventing the wheel, which tools are there that do what we want?
Virtual hosts is many DNS to one host, we are wanting the inverse (so this is a lot more akin to routing and load balancing)
On Sun, Apr 3, 2022, 8:04 PM RoiEX @.***> wrote:
@DanVanAtta https://github.com/DanVanAtta That's precisely my point. It sounds like the same concept. So i was just wondering if we're just reinventing the wheel here. So by using virtual hosts were making use of a well& established concept instead of a custom solution.
Local development and testing is a valid point though. I believe the Host header can be set as usual here to achieve the same thing but I'm not a 100% sure.
Both approaches have their pros and cons, but in the end they're almost identical
— Reply to this email directly, view it on GitHub https://github.com/triplea-game/triplea/issues/10252#issuecomment-1087059529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6SZOOBR5SYH3QNQ2PHH2DVDJL2LANCNFSM5SEFP65A . You are receiving this because you were mentioned.Message ID: @.***>
@RoiEXLab also what are your thoughts on the necessity of a preprod database?
CC: @tvleavitt , @bacrossland , y'all might be interested on this topic, feedback is welcome; particularly regarding any ways we can do this in a more simple manner and any potential pitfalls that you could foresee.
If reinventing the wheel, which tools are there that do what we want?
I was mainly thinking about nginx's server blocks. I was imagining a nginx config along the lines of this:
server {
include common.conf;
server_name 2-4.lobby.triplea-game.org;
location / {
proxy_pass http://localhost:1337;
}
}
# repeat for every instance with different port and server_name
However I just read this article and learned that the map
directive exists making this pretty convenient regardless. So it doesn't really matter what kind of header is used. A custom header could be used (like they do in the article) or alternatively the host header could also be used. So it's just a preference if you want to make it seem like there are many servers from the outside or if you want to hide them completely behind a single reverse proxy.
One thing that would also be possible in theory is to have some sort of versioning already built into the URL path. So https://lobby.triplea-game.org/2.4/*
would actually forward to http://localhost:8080/*
and for other versions respectively. But I'm sure this approach has it's own bunch of problems because now the root of the URL is no longer fixed, so I assume that's why it wasn't considered. Nginx is designed to achieve just this and using this approach we could just chain location
blocks with proxy_pass
directives and get a simple but expressive config. Just wanted to mention it for completeness.
Regarding a preprod database: I think there are some scenarios where it can be useful. This includes being able to find database migration issues that slipped through testing due to a small test set, as well as the possibility to have a playground to identify and debug reported issues that are not reproducable locally (maybe there's an issue that only occurs if the server and clients timezones differ). So it can be useful if there's actually data in the preprod database, but the better testing, code reviews and QA are the less useful it becomes I think.
I think we are pretty safe ground in terms of 'reinventing the wheel'. The 'wheel' in this case is using NGINX to do request routing using headers. The overall system design & configuration though is the thing we need to build.
Re: version in URL
A rule of thumb I've adopted is to avoid versions in URLs. Essentially a URL is the name of a resource, and it is an extremely long lived part of that entity. Using the URL for something that has a far shorter life span, less than the resource, creates a mismatch.
FWIW, there are a number of discussions/links supporting this perspective:
- https://stackoverflow.com/questions/972226/how-to-version-rest-uris
- https://sookocheff.com/post/api/how-to-version-a-rest-api/
- https://medium.com/@XenoSnowFox/youre-thinking-about-api-versioning-in-the-wrong-way-6c656c1c516b
I listened to a talk (that I am still trying to find) that described, IIRC, both that version values should go into headers and then be routed, and second to pin these versions to older and running instances thereby avoiding the backward compatibility scenario.
Version based header routing, a plus for local development
Version based header routing does make local testing quite feasible, we just spin up a docker with nginx and the needed config file for routing & then we send requests to localhost and can verify they get routed correctly. We are currently able to set up every other component locally, implying we would still be able to simulate the full stack on a local developer machine.
Preprod - to be or not to be?
Some reasons against a preprod
-
When it was dead, we did not really care. This is evidence it is not super useful. EG: preprod was dead for many months and only towards the last few months was there any kind of mounting concern.
-
The data in preprod is about as well 'seeded' as the data we seed for a local docker database. Ideally we would improve it to be a really good data set, nonetheless there is test data parity between a local docker & preprod.
-
The local docker database gives a good sandbox. It's easy to wipe & recreate data. In the example of time zone issues, that can be readily done locally by changing client clock or the docker clock.
-
Preprod broke a lot, it's extra maintenance, it's extra servers; overall things are simpler without it.
Risks of no Preprod Database
Seemingly these are pretty minimal since I don't think we did very much with the previous preprod database. The biggest risk I see is we make some sort of change that injects data that is bad that a previous (and running) lobby version cannot deal with. I cannot think of any examples o the preprod database was the thing that prevented a data problem. Generally it is the DB testing and local simulation that find any potential issues.
No Lobby Preprod is a non-issue
The fact that the lobby software can have a newer version running, is basically a preprod, it's only that we are interacting with prod data. This would make feature preview probably much better, and if we are careful about how we insert & update data, then arguably it's a far better environment for finding problems.
Latest Proposed System Diagram if there is no preprod
My vote is for the versioning through headers. It's not only easier for local development, it's easier to scale when running in production. If you pin the version to a DNS entry and the IP of the destination is updated, you have to wait for that DNS record change to roll through the internet (based on TTL and syncing of DNS servers) before all clients are routing to the proper location. That problem doesn't happen when routing by header option. Once the updated nginx config is deployed, routing happens immediately.
Having the version in the path of the url is similar to passing it as a header option but locks you into maintaining that pathing in future releases. If a change is made from a path of