continuous-integration icon indicating copy to clipboard operation
continuous-integration copied to clipboard

Setup HTTP redirects using Google Cloud Traffic Management

Open philwo opened this issue 4 years ago • 3 comments

Bazel's websites are behind two Google Cloud Load Balancers:

  • http-redirector listening on 130.211.20.222:80, 130.211.20.222:443 and 130.211.22.235:80
  • bazel-build listening on 130.211.22.235:443

The http-redirector LB forwards all requests to an nginx instance running on a VM with this config:

http {
  include             /etc/nginx/mime.types;
  default_type        application/octet-stream;

  # Redirect variations of the main website URL to the canonical one.
  server {
    listen 80;
    server_name bazel.build www.bazel.build bazel.io www.bazel.io;
    return 301 https://bazel.build$request_uri;
  }

  # Redirect http:// to https:// and *.bazel.io to *.bazel.build.
  server {
    listen 80;
    server_name ~^(?<subdomain>.+)\.bazel\.(?<tld>.+)$;
    return 301 https://$subdomain.bazel.build$request_uri;
  }

  # Redirect http(s)://ci.bazel.build/ to a new site.
  server {
    listen 80;
    server_name ci.bazel.build;
    return 301 https://github.com/bazelbuild/continuous-integration/blob/master/buildkite/README.md;
  }

  # Catch-all default server that just returns an error.
  server {
    listen 80 default_server;
    server_name _;
    add_header Content-Type text/plain;
    return 200 "Bazel Redirection Service";
  }
}

The full list of domains that are handled by the load balancers is:

bazel.foo                -> 130.211.20.222
www.bazel.foo            -> 130.211.22.235
tulsi.bazel.foo          -> 130.211.22.235
ij.bazel.foo             -> 130.211.22.235
skydoc.bazel.foo         -> 130.211.22.235

bazel.io                 -> 130.211.20.222
www.bazel.io             -> 130.211.20.222
tulsi.bazel.io           -> 130.211.20.222
ij.bazel.io              -> 130.211.20.222
skydoc.bazel.io          -> 130.211.20.222

bazel.build              -> 130.211.22.235
bcr.bazel.build          -> 130.211.22.235
be.bazel.build           -> 130.211.22.235
blog.bazel.build         -> 130.211.22.235
ci.bazel.build           -> 130.211.22.235
ci-staging.bazel.build   -> 130.211.22.235
cr.bazel.build           -> 130.211.22.235
cs.bazel.build           -> 130.211.22.235
dashboard.bazel.build    -> 130.211.22.235
docs.bazel.build         -> 130.211.22.235
docs-staging.bazel.build -> 130.211.22.235
eclipse.bazel.build      -> 130.211.22.235
ij.bazel.build           -> 130.211.22.235
mirror.bazel.build       -> 130.211.22.235
perf.bazel.build         -> 130.211.22.235
releases.bazel.build     -> 130.211.22.235
skydoc.bazel.build       -> 130.211.22.235
tulsi.bazel.build        -> 130.211.22.235
www.bazel.build          -> 130.211.22.235

(For some unknown reason, the *.foo domains apparently do not work at the moment. 🤷🏻)

The idea behind this setup is that we want to do various kind of rewrites:

  • Redirect old *.io domains to new *.build domains.
  • Redirect www.bazel.build to bazel.build.
  • Redirect http:// URLs to https:// URLs.
  • Redirect ci.bazel.build to https://github.com/bazelbuild/continuous-integration/blob/master/buildkite/README.md.

The nginx VM was necessary because Google Cloud could not do these kind of rewrites for a long time. However, since 2020 a new feature called Traffic Management is available and it looks like it should support this.

The goals of this project would be:

  • Simplify our load-balancer configuration (ideally remove the http-redirect one completely).
  • Shutdown and remove the http-redir VM.
  • Implement the additional rewrite rule for https://github.com/bazelbuild/bazel/pull/13519: Redirect https://docs.bazel.build/versions/master/... to https://docs.bazel.build/versions/main/... (but only once documentation is being published under this new URL).

philwo avatar Jun 07 '21 11:06 philwo

@coeuvre When you have some time, could you look whether this is feasible? :)

philwo avatar Jun 07 '21 11:06 philwo

Just checked the doc, the kind of rewrites we want should work except:

  • Redirect old *.io domains to new *.build domains.

IIUC, the traffic management doesn't support referencing the captured subdomains in the redirect rules. e.g. If we want to redirect both ij.bazel.io and skydoc.bazel.io to ij.bazel.build and skydoc.bazel.build respectively, we have to create two rules to match each host name (using *.bazel.io won't work). Since the number of subdomains for bazel.io is constant, we can manually create the rules for each of them but this will add the maintenance burden in the future.

Also, we will have to keep one dedicated LB for HTTP to HTTPS redirect purpose only due to the limitation.

Can you explain why would you like to remove http-redir component in our setup and maybe we can come up with other solutions to make it better?

coeuvre avatar Jun 08 '21 06:06 coeuvre

Can you explain why would you like to remove http-redir component in our setup and maybe we can come up with other solutions to make it better?

We were asked by the security team during the initial review of our infrastructure to eventually remove our nginx VM. They were just worried that it's a publicly accessible service that can potentially be hacked. For a while it seemed like there's no way around it, but with the new URL redirect feature it could be possible.

I just played around with the Load Balancer settings a bit right now to get a better understanding of what it can do and I think I managed to actually already achieve my first idea! 🤔 I implemented the redirects directly in the config and verified before vs. after with curl -v. It looks all fine to me. This should allow us to turn off the VM.

Also, we will have to keep one dedicated LB for HTTP to HTTPS redirect purpose only due to the limitation.

I think it's possible to include a port number in the host matcher - then we could do both in the same LB! But the configuration would become really ugly as we would need two config entries for each domain. Maybe it's easier to just keep it in two separate LBs, then we can configure the "http-redirect" LB with a catch-all rule to just do http -> https redirection.

I'll think about it.

philwo avatar Jun 08 '21 20:06 philwo