harbor
harbor copied to clipboard
GC log not found when GC fails
Expected behavior and actual behavior:
GC fails and when trying to get the log we get:
{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 6e0aaeb546660763a8a745dc\"}"}]}
sweeper is set to 14 days and we check immediately after the job failed.
Steps to reproduce the problem:
Run GC, when it fails click on "logs" link
Versions: Please specify the versions of following systems.
- harbor version: 2.5.3
- docker engine version: N/A
- docker-compose version: N/A
Additional context:
job_loggers:
- name: "DB"
level: INFO
sweeper:
duration: 14
Could you check the jobservice logs?
Could you check the jobservice logs?
I did, and can't find anything of relevance.
unless you can tell me.what to look for? perhaps I'm missing something
@chlins combed through the jobservice logs again and found this:
2022-09-14T20:36:56Z [INFO] [/jobservice/worker/cworker/reaper.go:181]: Reaper: mark job 4746d1529505908316d7f391 failed as job is still not finished in 1 day
4746d1529505908316d7f391
is indeed the GC job, as I was seeing this uuid return a 404 when trying to fetch the log:
2022-09-15T00:08:44Z [WARNING] [/pkg/joblog/dao/dao.go:51]: no job log founded. Query condition, uuid: 4746d1529505908316d7f391, err: no job log founded
2022-09-15T00:08:44Z [ERROR] [/jobservice/api/handler.go:289]: Serve http request 'GET /api/v1/jobs/4746d1529505908316d7f391/log' error: 404 {"code":10010,"message":"object is not found","details":"log entity: 4746d1529505908316d7f391"}
GC is very slow and looks like it's terminated after 24 hours, is this configurable? can we allow it to run longer?
also, if the job is terminated like this, shouldn't the log from the period that it was running still be stored?
It seems that the name of job_logger is invalid, see the sample configure, only support "FILE" and "STD_OUTPUT"
job_loggers:
- name: "STD_OUTPUT" # logger backend name, only support "FILE" and "STD_OUTPUT"
level: "INFO" # INFO/DEBUG/WARNING/ERROR/FATAL
- name: "FILE"
level: "INFO"
settings: # Customized settings of logger
base_dir: "/var/log/jobs"
sweeper:
duration: 1 #days
settings: # Customized settings of sweeper
work_dir: "/var/log/jobs"
Could you please update it and trigger another GC job manually and check if the current GC job writes the log?
It seems that the name of job_logger is invalid, see the sample configure, only support "FILE" and "STD_OUTPUT"
Hi, @stonezdj
This is strange, we've used DB job loggers since forever and it is a supported option in the official Helm Chart (which is how we deploy Harbor): https://github.com/goharbor/harbor-helm/blob/2efdb425c1a4f27f7c09cb0c6a4a3f3c175c326b/templates/jobservice/jobservice-cm.yaml#L36
are you saying that this option was removed?or deprecated?
to be clear, all the other job logs work as expected (retention, replication, scanning, etc...)
@chlins @stonezdj any feedback on this?
Having the same issue that GC job took more than 24 hours. I have ~ 2.66 TB (Ceph RadosGW as S3 backend for storage on 10GiB network) and growing and that's a vicious circle. The more images I get the longer should be GC. Is this reaper 24h limit configurable?
UPDATE: Seems it's not: https://github.com/goharbor/harbor/blob/a4c577f9ec4f18396207a5e686433a6ba203d4ef/src/jobservice/worker/cworker/reaper.go#L34
@chlins @dkulchinsky Guys what should I do? Does anyone have the solution to this?
Harbor 2.7.0
OK, file log shows this in realtime:
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [1/153781] delete blob from storage: sha256:731ecb5881599214f2be881db8f480f1e1193237aa53c5abf1e20b6213c92936
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [1/153781] delete blob record from database: 1585867, sha256:731ecb5881599214f2be881db8f480f1e1193237aa53c5abf1e20b6213c92936
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [2/153781] delete blob from storage: sha256:db73ab979baf8b084f86127c1917f5b0ac4d5899dc1def985f7a978a0caf7e5e
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [2/153781] delete blob record from database: 1564204, sha256:db73ab979baf8b084f86127c1917f5b0ac4d5899dc1def985f7a978a0caf7e5e
2023-01-21T23:30:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [3/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:301]: [3/153781] delete manifest from storage: sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:329]: [3/153781] delete artifact blob record from database: 345542, xxx, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:337]: [3/153781] delete artifact trash record from database: 345542, xxx, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [3/153781] delete blob from storage: sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [3/153781] delete blob record from database: 1584361, sha256:a07ee000965ed480f3839ce6226cbd4a33fc7e18f67787be5c65156984a324b5
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:355]: [4/153781] delete blob from storage: sha256:49bf23fef74969db26d7895d71a646b5afb77a50f649f75c661546eb61d0abec
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:384]: [4/153781] delete blob record from database: 1596574, sha256:49bf23fef74969db26d7895d71a646b5afb77a50f649f75c661546eb61d0abec
2023-01-21T23:33:41Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [5/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:5f20a0a983e40bba4b657069fc721787a62562bab1c830e0412ab2acb46c2351
...
2023-01-21T23:55:20Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [61/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:0f3fd2086ba7ba5f78b9d0dc00441eddf0745fa778b4da86eb0b3c15ed2ecd8e
...
2023-01-22T03:47:44Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [517/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:aab6fc9435637ed0292d6848096eb59925fb5dc8f0cf0205d3258777c6e65d36
...
2023-01-22T05:34:31Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: [717/153781] delete the manifest with registry v2 API: xxx, application/vnd.docker.distribution.manifest.v2+json, sha256:c95e604a79594cb64b8a5422fe41bfe9c6f500c1edd6c59c11a9bfcec2742531
With this speed It will be running approximately for forever and will not fit in 24 hours. The question is why it is not moving update time forward?
@chlins @dkulchinsky Guys what should I do? Does anyone have the solution to this?
no solution available here AFAIK, even if the deadline was configurable it wouldn't matter much given how slow GC becomes when the registry grows beyong ~1TB...
Actually the GC job is still running in job service, but you have to get the job id and check the logs in the file system. maybe this feature could display the running log of the GC job. #17886
@stonezdj that would be a useful feature for sure, however still not clear how this would work if the logs are stored in DB.
It is the same to logs in DB.
close with pr #17886