harbor icon indicating copy to clipboard operation
harbor copied to clipboard

GC issue when using S3 as backend storage

Open sizowie opened this issue 6 months ago • 10 comments

Hi,

we're facing an issue with the GC when using S3 as backend storage. There already closed issues with similar behavior (like #18896) - but none of them with an sufficient closing reason.

Release: Harbor 2.13.1 Problem: GC only removes unused manifest objects from S3. Blobs and Layers are remaining on S3. Bucket is set up without versioning. See logs below.

Before GC:

# aws s3 ls s3://registry/ --recursive
2025-06-23 07:16:17       5510 docker/registry/v2/blobs/sha256/85/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c/data
2025-06-23 07:16:18        625 docker/registry/v2/blobs/sha256/bc/bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6/data
2025-06-23 07:16:16  138165992 docker/registry/v2/blobs/sha256/fa/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486/data
2025-06-23 07:16:17         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c/link
2025-06-23 07:16:16         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486/link
2025-06-23 07:16:18         71 docker/registry/v2/repositories/janek/test1/_manifests/revisions/sha256/bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6/link
2025-06-23 07:16:18         71 docker/registry/v2/repositories/janek/test1/_manifests/tags/latest/current/link
2025-06-23 07:16:18         71 docker/registry/v2/repositories/janek/test1/_manifests/tags/latest/index/sha256/bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6/link

After GC via Harbor UI (with untagged-artifacts checked):

registry:
...
time="2025-06-23T10:01:19.488973985Z" level=info msg="authorized request" go.version=go1.23.8 http.request.host="registry:5000" http.request.id=c5257b96-1257-46bc-b603-b75bf555ad7a http.request.method=HEAD http.request.remoteaddr="x.x.8.30:44044" http.request.uri="/v2/janek/test1/manifests/sha256:bc20a1d4cfd32f39
15d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6" http.request.useragent=harbor-registry-client vars.name="janek/test1" vars.reference="sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6"
time="2025-06-23T10:01:19.513238586Z" level=info msg="redis: connect redis:6379" go.version=go1.23.8 instance.id=7be07b2e-4e6a-4c28-8dee-95e91ef99831 redis.connect.duration=2.440368ms service=registry version=v2.8.3-23-g0c62ec3e.m
time="2025-06-23T10:01:19.52625557Z" level=info msg="response completed" go.version=go1.23.8 http.request.host="registry:5000" http.request.id=c5257b96-1257-46bc-b603-b75bf555ad7a http.request.method=HEAD http.request.remoteaddr="x.x.8.30:44044" http.request.uri="/v2/janek/test1/manifests/sha256:bc20a1d4cfd32f391
5d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6" http.request.useragent=harbor-registry-client http.response.contenttype="application/vnd.oci.image.manifest.v1+json" http.response.duration=325.814066ms http.response.status=200 http.response.written=625
x.x.8.30 - - [23/Jun/2025:10:01:19 +0000] "HEAD /v2/janek/test1/manifests/sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6 HTTP/1.1" 200 625 "" "harbor-registry-client"
time="2025-06-23T10:01:19.820529047Z" level=info msg="authorized request" go.version=go1.23.8 http.request.host="registry:5000" http.request.id=15763a7b-c3ba-4967-a29d-b9a644e6e8a2 http.request.method=DELETE http.request.remoteaddr="x.x.8.30:44044" http.request.uri="/v2/janek/test1/manifests/sha256:bc20a1d4cfd32f
3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6" http.request.useragent=harbor-registry-client vars.name="janek/test1" vars.reference="sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6"
x.x.8.30 - - [23/Jun/2025:10:01:19 +0000] "DELETE /v2/janek/test1/manifests/sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6 HTTP/1.1" 202 0 "" "harbor-registry-client"
time="2025-06-23T10:01:19.920116344Z" level=info msg="response completed" go.version=go1.23.8 http.request.host="registry:5000" http.request.id=15763a7b-c3ba-4967-a29d-b9a644e6e8a2 http.request.method=DELETE http.request.remoteaddr="x.x.8.30:44044" http.request.uri="/v2/janek/test1/manifests/sha256:bc20a1d4cfd32f
3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6" http.request.useragent=harbor-registry-client http.response.duration=388.118907ms http.response.status=202 http.response.written=0
...

jobservice:
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:162]: Garbage Collection parameters: [delete_untagged: true, dry_run: false, time_window: 2, workers: 1]
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:172]: start to run gc in job.
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:546]: start to delete untagged artifact (no actually deletion for dry-run mode)
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:580]: end to delete untagged artifact (no actually deletion for dry-run mode)
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:595]: artifact trash candidates.
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:597]: ID-38 MediaType-application/vnd.oci.image.config.v1+json ManifestMediaType-application/vnd.oci.image.manifest.v1+json RepositoryName-janek/test1 Digest-sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6 CreationTime-2025-06-23 07:21:54
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: blob eligible for deletion: sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: blob eligible for deletion: sha256:fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: blob eligible for deletion: sha256:85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:273]: 2 blobs and 1 manifests eligible for deletion
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:274]: The GC could free up 131 MB space, the size is a rough estimation.
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:339]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete the manifest with registry v2 API: janek/test1, application/vnd.oci.image.manifest.v1+json, sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:19Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://core/api/v2.0/internalconfig
2025-06-23T10:01:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:368]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete manifest from storage: sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:19Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://core/api/v2.0/internalconfig
2025-06-23T10:01:21Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:396]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete artifact blob record from database: 38, janek/test1, sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:21Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:404]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete artifact trash record from database: 38, janek/test1, sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:21Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete blob from storage: sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:24Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:451]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][1/3] delete blob record from database: 39, sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
2025-06-23T10:01:24Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][2/3] delete blob from storage: sha256:fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486
2025-06-23T10:01:26Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:451]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][2/3] delete blob record from database: 37, sha256:fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486
2025-06-23T10:01:26Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][3/3] delete blob from storage: sha256:85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c
2025-06-23T10:01:26Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://core/api/v2.0/internalconfig
2025-06-23T10:01:28Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:451]: [f1fe6707-331f-4c70-8a07-6afdf1628b83][3/3] delete blob record from database: 38, sha256:85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c
2025-06-23T10:01:28Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:480]: 2 blobs and 1 manifests are actually deleted
2025-06-23T10:01:28Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:481]: The GC job actual frees up 131 MB space.
2025-06-23T10:01:28Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:518]: cache clean up completed
2025-06-23T10:01:28Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:200]: success to run gc in job.
2025-06-23T10:01:28Z [INFO] [/jobservice/runner/redis.go:152]: Job 'GARBAGE_COLLECTION:fd9b8e866ac2cfe0ef019dd6' exit with success
# aws s3 ls s3://registry/ --recursive
2025-06-23 07:16:17       5510 docker/registry/v2/blobs/sha256/85/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c/data
2025-06-23 07:16:18        625 docker/registry/v2/blobs/sha256/bc/bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6/data
2025-06-23 07:16:16  138165992 docker/registry/v2/blobs/sha256/fa/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486/data
2025-06-23 07:16:17         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c/link
2025-06-23 07:16:16         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486/link

Downloaded files via s3 cp to verify whether files are not "empty".

Then, I triggered the GC via the registry pod

sh-5.2$ registry_DO_NOT_USE_GC garbage-collect --delete-untagged /etc/registry/config.yml

janek/test1

0 blobs marked, 3 blobs and 0 manifests eligible for deletion
blob eligible for deletion: sha256:85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c
INFO[0000] Deleting blob: /docker/registry/v2/blobs/sha256/85/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c  go.version=go1.23.8 instance.id=ec24fe74-2c56-497f-aba7-0089ff92dd52 service=registry
blob eligible for deletion: sha256:bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6
INFO[0000] Deleting blob: /docker/registry/v2/blobs/sha256/bc/bc20a1d4cfd32f3915d95b11d781c9f3d6035de1a9f41fd888de6fb551863dc6  go.version=go1.23.8 instance.id=ec24fe74-2c56-497f-aba7-0089ff92dd52 service=registry
blob eligible for deletion: sha256:fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486
INFO[0000] Deleting blob: /docker/registry/v2/blobs/sha256/fa/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486  go.version=go1.23.8 instance.id=ec24fe74-2c56-497f-aba7-0089ff92dd52 service=registry

And the blobs got deleted... but empty layer links are remaining.

# aws s3 ls s3://registry/ --recursive

2025-06-23 07:16:17         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/85f392b721880238e07f28855292cf69d1d30b87c5730441e020378571db565c/link
2025-06-23 07:16:16         71 docker/registry/v2/repositories/janek/test1/_layers/sha256/fa0600c9647164ba2afff0613cc464cd2c882d5a228057d41932a60f13be4486/link

It seems to be an issue with Distribution Registry and/or Harbor itself. We could workaround with the manual GC run, to clean up the blobs, but overall it should be treated as an important bug.

Cheers, Janek

sizowie avatar Jun 23 '25 12:06 sizowie

Did you check the permission to your S3 bucket?

Vad1mo avatar Jun 24 '25 08:06 Vad1mo

Sure, otherwise nothing could be deleted.

tl;dr:

  1. Harbor GC only cleans up manifest objects from S3 (jobservice log shows that other objects like blobs are deleted though)
  2. Direct run of registry_DO_NOT_USE_GC will remove the orphaned blobs
  3. Still leftovers: empty layers

sizowie avatar Jun 24 '25 08:06 sizowie

@sizowie Thanks for writing up this issue, are you using AWS S3 or S3 compatible storage when you see this problem?

reasonerjt avatar Jul 01 '25 07:07 reasonerjt

@sizowie Thanks for writing up this issue, are you using AWS S3 or S3 compatible storage when you see this problem?

S3 compatible storage. Tested with Hitachi Content Platform and NetApp StorageGrid.

sizowie avatar Jul 01 '25 08:07 sizowie

The same issue is happening on ceph S3 storage.

DanilKichai avatar Jul 07 '25 04:07 DanilKichai

@sizowie can you share the registryctl log? And it is possible to see the logs in the s3 server side? The harbor GC are actually share the same code base with distribution, which is to call the storage driver to delete files.

@sizowie and can you set up a docker distribution v2.8.3 with the same object storage? And try with the GC?

wy65701436 avatar Aug 06 '25 11:08 wy65701436

Sorry for the late response, I had no free time to set up a test-environment for that and I also don't have access to the S3 infrastructure.

But I found the problem. For each blob that needs to be purged Harbors GC (registryctl) logs

2025-10-01T13:47:34Z [DEBUG] [/lib/http/error.go:63]: {"errors":[{"code":"NOT_FOUND","message":"s3aws: Path not found: /docker/registry/v2/blobs/sha256/15/15c4ac6d4798ad6300c267fe5e09efecb4cb0afe4ce7a276cfeb50ce24a40a31"}]}

Other than on regular filesystems, S3 doesn't have "directory"-objects. These objects doesn't exist in S3 (that's why we see a "Path not found"), instead you find /docker/registry/v2/.../<sha>/data objects. These objects must be removed instead.

sizowie avatar Oct 01 '25 18:10 sizowie

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Dec 01 '25 09:12 github-actions[bot]

This issue is still valid - I've ran into it on a few different S3-compat stores now

endocrimes avatar Dec 01 '25 09:12 endocrimes

Make sure deployment registry has s3 credential. MR related: https://github.com/goharbor/harbor-helm/pull/1545

tranthang2404 avatar Dec 04 '25 07:12 tranthang2404