gcr-cleaner icon indicating copy to clipboard operation
gcr-cleaner copied to clipboard

Add tag-filter-none (or tag-filter-exclude)

Open StevenACoffman opened this issue 3 years ago • 9 comments

Currently, we have automation that tags all of our container images with the git commit SHA1 of the source code that generated them. We would like to clean up old and unused ones. We know which git commit SHA1 is currently deployed, and would like to exclude it from being deleted. grace gives us some protection, as we just pick a generous number and then hopefully our release cadence insures that we are safe. If we stop working in the repo for more than a month, someone has to remember to turn off the cleaner.

So for now we are doing the equivalent of this:

gcr-cleaner-cli -grace 720h -repo gcr.io/khan-internal-services/districts-jobs-roster -allow-tagged -keep "0" -tag-filter-any '.*'

Since golang doesn't support negative lookaheads in regex, we can instead use -tag-filter-all and play games with a non-match regex (this is a handy utility for that) and anything that isn't exactly the git commit SHA1. For the git SHA1 15852b1a96a5126f3fc83ba48437c98cf0768ad5 :

gcr-cleaner-cli -grace 720h -repo gcr.io/khan-internal-services/districts-jobs-roster -allow-tagged -keep "0" -tag-filter-any '^([^1]|1(1|5(852b1(5|a96a515))*(1|8(1|5(1|2(1|b1(1|a(1|9(1|6(1|a(1|51(1|2(1|6(1|f(1|3(1|f(1|c(1|8(1|3(1|b(1|a(1|4(1|8(1|4(1|3(1|7(1|c(1|9(1|8(1|c(1|f(1|0(1|7(1|6(1|8(1|a(1|d1))))))))))))))))))))))))))))))))))))*([^15]|5(852b1(5|a96a515))*([^18]|8([^15]|5([^12]|2([^1b]|b([^1]|1([^15a]|a([^19]|9([^16]|6([^1a]|a([^15]|5([^1]|1([^125]|2([^16]|6([^1f]|f([^13]|3([^1f]|f([^1c]|c([^18]|8([^13]|3([^1b]|b([^1a]|a([^14]|4([^18]|8([^14]|4([^13]|3([^17]|7([^1c]|c([^19]|9([^18]|8([^1c]|c([^1f]|f([^01]|0([^17]|7([^16]|6([^18]|8([^1a]|a([^1d]|d[^15])))))))))))))))))))))))))))))))))))))))*(1(1|5(852b1(5|a96a515))*(1|8(1|5(1|2(1|b1(1|a(1|9(1|6(1|a(1|51(1|2(1|6(1|f(1|3(1|f(1|c(1|8(1|3(1|b(1|a(1|4(1|8(1|4(1|3(1|7(1|c(1|9(1|8(1|c(1|f(1|0(1|7(1|6(1|8(1|a(1|d1))))))))))))))))))))))))))))))))))))*(5(852b1(5|a96a515))*(8(5?|52(b?|b1(a(9?|96((a5?)?|a51(2(6?|6f(3?|3f(c(8((3(b?|ba(4(8?|84(3?|37(c?|c9(8?|8c(f?|f0(7?|768?)))))))?))?|3ba48437c98cf0768ad?))?)?)))?)))?)))?)?)?$'

This seems... yucky and vaguely horrifying so we would rather have the capability to exclude certain tags using tag-filter-none (or tag-filter-exclude).

StevenACoffman avatar Dec 16 '21 15:12 StevenACoffman

It would be great to have a feature where we could play with both tag_filter_any and keep. Say I would like to keep 10 tags matching main.* and 10 tags matching dev.*.

The only solution I can think of right now is having 2 different schedulers with different tag_filter_any values.

joaocarmo avatar Jan 06 '22 15:01 joaocarmo

@StevenACoffman for your use case, why not set keep to 1 or 5. That would ensure the N most recent versions are kept, regardless of timestamp.

sethvargo avatar Jan 06 '22 16:01 sethvargo

  • keep - If an integer is provided, it will always keep that minimum number of images. Note that it will not consider images inside the grace duration.

Is keep=1 per docker tag or per repository? gcr.io/khan-internal-services/districts-jobs-roster has countless git SHA1 docker tags that I do not wish to retain, and only 1 (or perhaps 2) that I do wish to retain. However, it is also quite possible that the chronologically latest docker tag is not yet the one in production. The one currently in production must be at all costs preserved.

Repository -

A collection of tags grouped under a common prefix (the name component before :). For example, in an image tagged with the name my-app:3.1.4, my-app is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by the service's DNS hostname. The hostname must follow comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.

Tag -

A tag serves to map a descriptive, user-given name to any single image ID.

Image Name -

Informally, the name component after any prefixing hostnames and namespaces.

StevenACoffman avatar Jan 06 '22 17:01 StevenACoffman

It's per repository.

...has countless git SHA1 docker tags that I do not wish to retain, and only 1 (or perhaps 2) that I do wish to retain. However, it is also quite possible that the chronologically latest docker tag is not yet the one in production.

I think the better practice here is to tag images that are in production with prod and then you can exclude those (or run a separate process to keep the latest N).

sethvargo avatar Jan 06 '22 19:01 sethvargo

Sounds great to me. So would you be open to a pull request that would add tag-filter-exclude so that I could do tag-filter-exclude=prod?

StevenACoffman avatar Jan 06 '22 20:01 StevenACoffman

I think we'd need to figure out how that plays with the existing -tag-filter

sethvargo avatar Jan 06 '22 21:01 sethvargo

I would assume that it would err on the side of caution and if both -tag-filter-any and -tag-filter-exclude match the same tag, that exclude wins and it would not be deleted. "First, do no harm" and all that.

StevenACoffman avatar Jan 06 '22 22:01 StevenACoffman

I think that makes sense, but it will be a little bit of a refactor. The current behavior is that all tag filters are mutually exclusive with eachother.

sethvargo avatar Jan 06 '22 22:01 sethvargo

I have a similar use case as OP, and not being able to easily exclude images by tag makes this utility much less useful.

Another approach would be to add a -tag-filter-action flag that takes values delete or exclude (default to delete). If set to exclude then the list of images matched by -tag-filter-any and -tag-filter-all would be excluded.

porjo avatar Sep 07 '22 23:09 porjo

Actually #107 was rather cool.. So would be great to incorporate something like that..

ingwarsw avatar Dec 22 '22 10:12 ingwarsw

Kind of the same boat, I'd like to remove everything older than N days that is not tagged with something following a SemVer format and I was surprised to be denied to use a negative lookahead.

The flag --tag-filter-exclude would make sense.

Maybe we could just say that you either use --tag-filter-exclude or --tag-filter-all, if both are used it raises an error (like for all vs any).

It would probably make things simpler?

It just return a predicate like for the rest:

https://github.com/GoogleCloudPlatform/gcr-cleaner/blob/main/pkg/gcrcleaner/filter.go#L45

No refactoring needed I believe?

panthony avatar Dec 29 '22 14:12 panthony

Any plans to implement this feature? That would be amazing, considering we already had the code, just the user didn't want to sign the CLA, does #107 look good code wise? I'd be up to submit if that's the case.

sidineyc avatar Feb 23 '23 14:02 sidineyc

Hello, are there any updates on this issue?

shydefoo avatar May 22 '23 06:05 shydefoo

Hi all - users should prefer the native Google Artifact Registry functionality instead of gcr-cleaner. We are only fixing bugs and security issues in gcr-cleaner now that there's a native (and free) feature in the Google Cloud product.

sethvargo avatar Mar 06 '24 13:03 sethvargo