harbor GC Can delete that are not supposed to be deleted.

Expected behavior and actual behavior: We have observed that a few times already, In some cases the GC might delete images that were not scheduled to be deleted. The result is that information is present in Harbor UP and DB but not in S3.

It is also observable in the GC Logs that the manifest was deleted.

Steps to reproduce the problem:

In the UI the image is still visible

The image manifest sha is start with d4f9a6cf

#GC LOG
2024-06-04T04:13:09Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:238]: blob eligible for deletion: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:366]: [108/1438] delete blob from storage: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585
2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:395]: [108/1438] delete blob record from database: 5040, sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

trying to pull this image results in manifest unknown instead of not found if the images don't exist anymore.



We observed that during the GC run there have been some other operations going on DB level, indicating the we have run out of DB connections. 

```sh
2024-06-04T04:22:39Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.Handler of topic PUSH_ARTIFACT: failed to connect to `host=harbor-pg-database user=harbor database=harbor`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))

Versions:

harbor version: 2.7.x, 2.9.x, 2.10.

Additional context:

Log files: No other errors in the logs besides DB SQLSTATE 53300
GC Job completed successfully

Jun 04 '24 12:06 Vad1mo

maybe related to https://github.com/beego/beego/issues/5255 resolved by https://github.com/goharbor/harbor/pull/20452

Jun 04 '24 13:06 Vad1mo

Similar issue: https://github.com/goharbor/harbor/issues/19401

Jun 05 '24 09:06 wy65701436

The issue may be caused by the beego ORM, as it doesn't carry errors during data scanning. In some extreme cases, such as when a connection is out of use, the ORM returns incorrect data, leading to wrong blob deletion candidates. We're working on upgrading Beego with this pull request. https://github.com/goharbor/harbor/pull/20555

To mitigate the issue, you can now schedule garbage collection during low usage time slots.

Jun 05 '24 09:06 wy65701436

maybe related to beego/beego#5255 resolved by #20452

We have this PR for it:

https://github.com/goharbor/harbor/pull/20555

Jun 16 '24 07:06 zyyw

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

Aug 15 '24 09:08 github-actions[bot]

@Vad1mo, does the Harbor 2.12 with the fix solve this issue for you? We are experiencing the same issue, and we have lots of containers broken because of missing layers.

Dec 02 '24 11:12 hudymi

@hudymi yes, it's in the v2.12. Can you confirm whether the missing layers are removed by GC? If so, could you also confirm that the removed digests do not belong to any artifacts in use at the time of GC execution?

Jan 20 '25 08:01 wy65701436

@wy65701436 are there any steps that I should follow to check the compatibility? Our check was simply based on docker pull command and then checking if missing layer is in S3 (where it was missing)

Mar 11 '25 10:03 hudymi

@hudymi, did you launch the GC with the "Allow garbage collection on untagged artifacts" option ?

Mar 21 '25 09:03 prgss

@prgss yes

Mar 21 '25 10:03 hudymi

One thing. We are on Harbor 2.11, and I asked if 2.12 fixes the problem, so can we reenable GC after the upgrade.

Mar 21 '25 10:03 hudymi

@wy65701436 We upgraded from 2.10.3 to 2.12.2 and GC deleted images that had valid tags.

May 08 '25 11:05 piyush94