harbor icon indicating copy to clipboard operation
harbor copied to clipboard

GC Can delete that are not supposed to be deleted.

Open Vad1mo opened this issue 1 year ago • 4 comments

Expected behavior and actual behavior: We have observed that a few times already, In some cases the GC might delete images that were not scheduled to be deleted. The result is that information is present in Harbor UP and DB but not in S3.

It is also observable in the GC Logs that the manifest was deleted.

Steps to reproduce the problem:

In the UI the image is still visible image

The image manifest sha is start with d4f9a6cf

#GC LOG
2024-06-04T04:13:09Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:238]: blob eligible for deletion: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:366]: [108/1438] delete blob from storage: sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585
2024-06-04T04:14:30Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:395]: [108/1438] delete blob record from database: 5040, sha256:d4f9a6cf78a2482148fd3a429c1d2019bf27a3cee1dc74856344a5e03c521585

trying to pull this image results in manifest unknown instead of not found if the images don't exist anymore.



We observed that during the GC run there have been some other operations going on DB level, indicating the we have run out of DB connections. 

```sh
2024-06-04T04:22:39Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.Handler of topic PUSH_ARTIFACT: failed to connect to `host=harbor-pg-database user=harbor database=harbor`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))

Versions:

  • harbor version: 2.7.x, 2.9.x, 2.10.

Additional context:

  • Log files: No other errors in the logs besides DB SQLSTATE 53300
  • GC Job completed successfully

Vad1mo avatar Jun 04 '24 12:06 Vad1mo

maybe related to https://github.com/beego/beego/issues/5255 resolved by https://github.com/goharbor/harbor/pull/20452

Vad1mo avatar Jun 04 '24 13:06 Vad1mo

Similar issue: https://github.com/goharbor/harbor/issues/19401

wy65701436 avatar Jun 05 '24 09:06 wy65701436

The issue may be caused by the beego ORM, as it doesn't carry errors during data scanning. In some extreme cases, such as when a connection is out of use, the ORM returns incorrect data, leading to wrong blob deletion candidates. We're working on upgrading Beego with this pull request. https://github.com/goharbor/harbor/pull/20555

To mitigate the issue, you can now schedule garbage collection during low usage time slots.

wy65701436 avatar Jun 05 '24 09:06 wy65701436

maybe related to beego/beego#5255 resolved by #20452

We have this PR for it:

  • https://github.com/goharbor/harbor/pull/20555

zyyw avatar Jun 16 '24 07:06 zyyw

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Aug 15 '24 09:08 github-actions[bot]

@Vad1mo, does the Harbor 2.12 with the fix solve this issue for you? We are experiencing the same issue, and we have lots of containers broken because of missing layers.

hudymi avatar Dec 02 '24 11:12 hudymi

@hudymi yes, it's in the v2.12. Can you confirm whether the missing layers are removed by GC? If so, could you also confirm that the removed digests do not belong to any artifacts in use at the time of GC execution?

wy65701436 avatar Jan 20 '25 08:01 wy65701436

@wy65701436 are there any steps that I should follow to check the compatibility? Our check was simply based on docker pull command and then checking if missing layer is in S3 (where it was missing)

hudymi avatar Mar 11 '25 10:03 hudymi

@hudymi, did you launch the GC with the "Allow garbage collection on untagged artifacts" option ?

prgss avatar Mar 21 '25 09:03 prgss

@prgss yes

hudymi avatar Mar 21 '25 10:03 hudymi

One thing. We are on Harbor 2.11, and I asked if 2.12 fixes the problem, so can we reenable GC after the upgrade.

hudymi avatar Mar 21 '25 10:03 hudymi

@wy65701436 We upgraded from 2.10.3 to 2.12.2 and GC deleted images that had valid tags.

piyush94 avatar May 08 '25 11:05 piyush94