harbor GC log not found when GC fails

Expected behavior and actual behavior:

GC fails and when trying to get the log we get:

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 6e0aaeb546660763a8a745dc\"}"}]}

sweeper is set to 14 days and we check immediately after the job failed.

Steps to reproduce the problem:

Run GC, when it fails click on "logs" link

Versions: Please specify the versions of following systems.

harbor version: 2.5.3
docker engine version: N/A
docker-compose version: N/A

Additional context:

job_loggers:
  - name: "DB"
    level: INFO
    sweeper:
      duration: 14

Sep 11 '22 18:09 dkulchinsky

Could you check the jobservice logs?

Sep 14 '22 02:09 chlins

Could you check the jobservice logs?

I did, and can't find anything of relevance.

unless you can tell me.what to look for? perhaps I'm missing something

Sep 14 '22 08:09 dkulchinsky

@chlins combed through the jobservice logs again and found this:

2022-09-14T20:36:56Z [INFO] [/jobservice/worker/cworker/reaper.go:181]: Reaper: mark job 4746d1529505908316d7f391 failed as job is still not finished in 1 day

4746d1529505908316d7f391 is indeed the GC job, as I was seeing this uuid return a 404 when trying to fetch the log:

2022-09-15T00:08:44Z [WARNING] [/pkg/joblog/dao/dao.go:51]: no job log founded. Query condition, uuid: 4746d1529505908316d7f391, err: no job log founded

2022-09-15T00:08:44Z [ERROR] [/jobservice/api/handler.go:289]: Serve http request 'GET /api/v1/jobs/4746d1529505908316d7f391/log' error: 404 {"code":10010,"message":"object is not found","details":"log entity: 4746d1529505908316d7f391"}

GC is very slow and looks like it's terminated after 24 hours, is this configurable? can we allow it to run longer?

also, if the job is terminated like this, shouldn't the log from the period that it was running still be stored?

Sep 15 '22 00:09 dkulchinsky

It seems that the name of job_logger is invalid, see the sample configure, only support "FILE" and "STD_OUTPUT"

job_loggers:
  - name: "STD_OUTPUT" # logger backend name, only support "FILE" and "STD_OUTPUT"
    level: "INFO" # INFO/DEBUG/WARNING/ERROR/FATAL
  - name: "FILE"
    level: "INFO"
    settings: # Customized settings of logger
      base_dir: "/var/log/jobs"
    sweeper:
      duration: 1 #days
      settings: # Customized settings of sweeper
        work_dir: "/var/log/jobs"

Could you please update it and trigger another GC job manually and check if the current GC job writes the log?

Sep 20 '22 03:09 stonezdj

It seems that the name of job_logger is invalid, see the sample configure, only support "FILE" and "STD_OUTPUT"

Hi, @stonezdj

This is strange, we've used DB job loggers since forever and it is a supported option in the official Helm Chart (which is how we deploy Harbor): https://github.com/goharbor/harbor-helm/blob/2efdb425c1a4f27f7c09cb0c6a4a3f3c175c326b/templates/jobservice/jobservice-cm.yaml#L36

are you saying that this option was removed?or deprecated?

to be clear, all the other job logs work as expected (retention, replication, scanning, etc...)

Sep 20 '22 11:09 dkulchinsky

@chlins @stonezdj any feedback on this?

Nov 11 '22 14:11 dkulchinsky

Having the same issue that GC job took more than 24 hours. I have ~ 2.66 TB (Ceph RadosGW as S3 backend for storage on 10GiB network) and growing and that's a vicious circle. The more images I get the longer should be GC. Is this reaper 24h limit configurable?

UPDATE: Seems it's not: https://github.com/goharbor/harbor/blob/a4c577f9ec4f18396207a5e686433a6ba203d4ef/src/jobservice/worker/cworker/reaper.go#L34

@chlins @dkulchinsky Guys what should I do? Does anyone have the solution to this?

Jan 21 '23 23:01 Antiarchitect

Harbor 2.7.0

OK, file log shows this in realtime:

2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [1/153781] delete blob from storage: sha256:731ecb5881599214f2be881db8f480f1e1193237aa53c5abf1e20b6213c92936
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [1/153781] delete blob record from database: 1585867, sha256:731ecb5881599214f2be881db8f480f1e1193237aa53c5abf1e20b6213c92936
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [2/153781] delete blob from storage: sha256:db73ab979baf8b084f86127c1917f5b0ac4d5899dc1def985f7a978a0caf7e5e
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [2/153781] delete blob record from database: 1564204, sha256:db73ab979baf8b084f86127c1917f5b0ac4d5899dc1def985f7a978a0caf7e5e
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [3/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:301]: [3/153781] delete manifest from storage: sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:329]: [3/153781] delete artifact blob record from database: 345542, xxx, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:337]: [3/153781] delete artifact trash record from database: 345542, xxx, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [3/153781] delete blob from storage: sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [3/153781] delete blob record from database: 1584361, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [4/153781] delete blob from storage: sha256:49bf23fef74969db26d7895d71a646b5afb77a50f649f75c661546eb61d0abec
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [4/153781] delete blob record from database: 1596574, sha256:49bf23fef74969db26d7895d71a646b5afb77a50f649f75c661546eb61d0abec
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [5/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:5f20a0a983e40bba4b657069fc721787a62562bab1c830e0412ab2acb46c2351

...

2023-01-21T23:55:20Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [61/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:0f3fd2086ba7ba5f78b9d0dc00441eddf0745fa778b4da86eb0b3c15ed2ecd8e

...

2023-01-22T03:47:44Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [517/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:aab6fc9435637ed0292d6848096eb59925fb5dc8f0cf0205d3258777c6e65d36

...

2023-01-22T05:34:31Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [717/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:c95e604a79594cb64b8a5422fe41bfe9c6f500c1edd6c59c11a9bfcec2742531

With this speed It will be running approximately for forever and will not fit in 24 hours. The question is why it is not moving update time forward?

Jan 21 '23 23:01 Antiarchitect

@chlins @dkulchinsky Guys what should I do? Does anyone have the solution to this?

no solution available here AFAIK, even if the deadline was configurable it wouldn't matter much given how slow GC becomes when the registry grows beyong ~1TB...

Jan 22 '23 14:01 dkulchinsky

Actually the GC job is still running in job service, but you have to get the job id and check the logs in the file system. maybe this feature could display the running log of the GC job. #17886

Feb 14 '23 01:02 stonezdj

@stonezdj that would be a useful feature for sure, however still not clear how this would work if the logs are stored in DB.

Feb 14 '23 02:02 dkulchinsky

It is the same to logs in DB.

Feb 14 '23 06:02 stonezdj

close with pr #17886

Mar 14 '23 01:03 stonezdj

harbor harbor copied to clipboard

GC log not found when GC fails

harbor
harbor copied to clipboard