dshop icon indicating copy to clipboard operation
dshop copied to clipboard

Static file serving slowness

Open franckc opened this issue 4 years ago • 6 comments

We've noticed on and off static files serving slowness.

Root cause is still TBD. Might be due to Google shared disk performance degrading over high load?

A cheap mitigation could be to move Brave to CDN serving since that is likely the store that causes high load.

franckc avatar Jan 20 '21 22:01 franckc

Has this shown up again or is this just to track the issue?

Will see about moving the one store to the CDN. Would alleviate some load from the backend nodes.

I also want to take a couple of days in the near future to migrate to a new dshop-only cluster. It'll reduce technical debt by being tied up with the old marketplace, bring in likely thousands of bugfixes, and give us better opportunities for troubleshooting. Probably the only way we're going to see the long-term hands-off stability we're looking for.

mikeshultz avatar Jan 21 '21 16:01 mikeshultz

There hasn't been any recent slowness report that I know of... All the known reports you have handled :)

Agreed moving the big store that starts with a B to CDN makes sense.

Also agreed that if moving to a new dedicated shiny kube cluster is only a couple days of work, then it is probably an investment that will pay off by bringing more stability to the system. Can you think a bit more about the detailed scope of the work and then we can finalize the decision?

franckc avatar Jan 21 '21 23:01 franckc

A lot of the scope is "figure it out as I go along" but generally:

  1. Make sure there haven't been any revolutionary new kubernetes services added to GCP that might replace GKE if not...
  2. Stand up a new relatively small GKE cluster. Large enough to leave some room to wiggle but not break the wallet.
  3. Create necessary helm charts in dshop repo, mostly based off existing ones. Should generally just be moving and making small tweaks for the new cluster or to cleanup unnecessary leftovers from previous configuration. Not reinventing the wheel here.
  4. Apply charts to new cluster
  5. Test its working. If not, tweak and GOTO 3
  6. Migrate DNS records to move traffic to the new cluster

Shouldn't be any need to transfer data or anything like that. Will probably take a day or so, but I quoted 2 in proper engineering fashion of dealing with the unknown unknowns. And should it be necessary, we could revert to the old cluster.

mikeshultz avatar Jan 21 '21 23:01 mikeshultz

Thanks. I think it's worth standing up a new cluster.

As a bonus, I'd encourage us to document the setup steps and make the config self-contained so that anyone from the community can easily bring up their own kubernetes dshop backend.

franckc avatar Jan 22 '21 18:01 franckc

New cluster deployed. Want to migrate our largest shop to a CDN ahead of time just to minimize the likelyhood of double-downtime. The new cluster is up and running, sans queue workers to prevent collisions. Migrating should be as simple as updating DNS record for dshop.originprotocol.com to point at the new IP.

There's one gotcha we will need to take care of. In some custom domain cases (root domains) we did instruct users to use an IP address for an A record. This IP would be for the IP of the old issuer. Immediately after the migration, the old cluster will continue to serve their shop from this IP. Before shutting down the old dshop cluster, we should move the IP to a new Kubernetes LoadBalancer pointed at the issuer on the new cluster. This should be relatively trivial, but may result in a few minutes of downtime for these users. Just don't want to forget this, so documenting here.

mikeshultz avatar Mar 04 '21 01:03 mikeshultz

The previously mentioned shop has been moved over to the CDN. Clear to migrate the dshop cluster whenever. I expect the migration to be smooth, but it's not a high priority so I may drag my feet on it a little.

mikeshultz avatar Mar 04 '21 04:03 mikeshultz