infra icon indicating copy to clipboard operation
infra copied to clipboard

Garbage-collect cache.nixos.org

Open edolstra opened this issue 2 years ago • 15 comments

Is your feature request related to a problem? Please describe.

We need to reduce our S3 storage costs.

https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672

Describe the solution you'd like

Write a program that does the following:

  • Delete old NixOS/Nixpkgs releases from https://releases.nixos.org/. (Instead of deleting, we could move them to Glacier as an intermediate step.) Specifically:
    • For every NixOS release older than 2 years, delete all point releases except the most recent. E.g. for nixos-14.04, we keep nixos-14.04.630.8a3eea0 (since that's the latest) and delete all the others.
    • For every NixOS release younger than 2 years, delete all pre-releases older than 1 month.
    • For Nixpkgs, we don't have stable releases, so just delete all releases older than 2 years.
  • Run a garbage-collect on cache.nixos.org, using the following store paths as roots:
    • The contents of every store-paths.xz from the NixOS/Nixpkgs releases.
    • The current Hydra roots (see hydra-update-gc-roots.
    • The fallback-paths.nix files from every stable Nix release.
    • Every content-addressed store path (i.e. whose .narinfo has a CA field). This prevents GC-ing source tarballs.
    • Every store path older than TBD whose name ends in .tar.[gz|xz|lzma|.... This prevents GC-ing source tarballs that predate the introduction of the CA field.
    • Every store path whose .narinfo is younger than 2 years.

edolstra avatar Oct 02 '23 13:10 edolstra

Please not. As described in https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672/4, GC'ing should be the last resort.

There's been plenty of alternatives suggested in the discourse post, but in general, there's some ways described how we can keep a GET https://cache.nixos.org/nar/…* working, without having to keep all NAR files in the S3 bucket.

flokli avatar Oct 02 '23 14:10 flokli

The most significant cost bump is due to direct access to the S3 bucket. See https://github.com/NixOS/nixos-org-configurations/issues/277#issuecomment-1741777458 .

zimbatm avatar Oct 02 '23 14:10 zimbatm

@flokli No, GCing is definitely necessary. It's what we promised to Amazon. (The increase in data traffic to the S3 bucket is the most pressing, but we also need to reduce storage costs.)

edolstra avatar Oct 02 '23 14:10 edolstra

@flokli No, GCing is definitely necessary. It's what we promised to Amazon. (The increase in data traffic to the S3 bucket is the most pressing, but we also need to reduce storage costs.)

I assume we didn't promise AWS to GC, but to reduce the amount of data we store there, no?

flokli avatar Oct 02 '23 15:10 flokli

I think it is important to have a meeting on the subject to set the expectations straight, @edolstra

We made multiple suggestions of a task force to explore getting the historical data somewhere else, etc. I feel like it's important to not let those help proposals down.

The people who want to unblock the situation and make it better needs more communication from the people driving this matter.

I propose to convene a meeting to discuss laying out a plan for our S3 situation w.r.t. to AWS promises, it should include:

  • timeline for letting folks figuring out a better solution than GC'ing
  • timeline for executing a GC if we don't meet our alternative solution deadline

Otherwise, people have been working for naught and this is disrespectful. And I agree that if nothing happens, we should just proceed with the GCing solution.

RaitoBezarius avatar Oct 02 '23 15:10 RaitoBezarius

I assume we didn't promise AWS to GC, but to reduce the amount of data we store there, no?

Well, that's the same thing (unless we migrate away from S3 entirely). We can of course mirror closures elsewhere before GCing, if we want to pay the egress costs.

I propose to convene a meeting

Good idea.

GCing is not super-urgent (compared to dealing with the increase in traffic costs), but we should be able to show AWS that we're making some progress.

edolstra avatar Oct 02 '23 17:10 edolstra

Well, that's the same thing (unless we migrate away from S3 entirely).

No - for example, we could migrate away from S3 partially and use a cheaper/less reliable tier of long term storage for older/lesser accessed files. We'd keep the strong reliability and performance benefits of S3 for the most important paths, and less important paths could be held on other best-effort mirrors. That's just one potential set of "intermediate solutions", there is likely a whole gradient of them.

(This is not an invitation to design such a solution here - the fact that they potentially exist and would be viable is sufficient for the sake of this argument.)

Performing a GC means these options disappear forever.

delroth avatar Oct 02 '23 17:10 delroth

we could migrate away from S3 partially

That's what I meant with "We can of course mirror closures elsewhere before GCing".

edolstra avatar Oct 02 '23 17:10 edolstra

GCing is not super-urgent (compared to dealing with the increase in traffic costs), but we should be able to show AWS that we're making some progress.

Thanks for confirming this. So https://github.com/NixOS/nixos-org-configurations/issues/277 is probably more effective in reducing costs.

That's what I meant with "We can of course mirror closures elsewhere before GCing".

Deleting things from the bucket can happen as a step in the migration even, if we have the logic in place to steer requests to the long-term storage, so these things are not mutually exclusive.

(This is not an invitation to design such a solution here - the fact that they potentially exist and would be viable is sufficient for the sake of this argument.)

I am interested in designing such a solution, and would be okay with driving this forward in some capacity. I'm open to having a meeting to discuss plans etc.

flokli avatar Oct 02 '23 17:10 flokli

Here we go, please fill this to discuss: https://crab.fit/cachenixosorg-size-reduction-strategies-420938

RaitoBezarius avatar Oct 02 '23 17:10 RaitoBezarius

@flokli @edolstra @zimbatm let's meet the 24th October, at 4PM CEST, according to that poll.

RaitoBezarius avatar Oct 09 '23 15:10 RaitoBezarius

That's in 30 minutes. Coordination in https://matrix.to/#/#infra:nixos.org I guess?

vcunat avatar Oct 24 '23 13:10 vcunat

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-10-24-re-long-term-s3-cache-solutions-meeting-minutes-1/34580/1

nixos-discourse avatar Oct 25 '23 12:10 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-long-term-resolution-phase-1/36493/1

nixos-discourse avatar Dec 05 '23 17:12 nixos-discourse

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/s3-sponsorship-extension-more-resources-to-build-a-more-sustainable-nix/50936/1

nixos-discourse avatar Aug 21 '24 16:08 nixos-discourse