artifact scan stuck in queue status unable to re-scan
Hi,
I'm using Harbor 2.11 and are some artifacts are stuck in a queue state for vuln scan by Trivy for a few days, when trying to stop the scan it comes up with the error no scan job for artifact. I have checked the job workers and there are no queued scans currently.
Due to this issue I am unable to stop the scan so that I can issue another one. Is it possible to force a stop somewhere, ideally if it can't find the scan job for the artifact it would reset the queue status so that it can be scanned again.
Maybe related with https://github.com/goharbor/harbor/issues/19486.
The artifacts that are stuck even when deleted and a GC is ran when trying to upload the same artifact with a different tag get the output of:
624033d3e11d: Mounted from app12378/wiremock c6e36f73d00a: Mounted from app12378/wiremock 501689fa7b84: Mounted from app12378/wiremock 37c89cb71f6f: Mounted from app12378/wiremock 0e0c99000308: Mounted from app12378/wiremock 08357c8b2220: Mounted from app12378/wiremock f4462d5b2da2: Mounted from app12378/wiremock
and the scan is still stuck in queued state
Could it be possible the the layer its trying to mount is stuck in a scan and therefore blocking this one also?
Sorry, I can't fully understand your problem. Could you please elaborate it in more details?
Could you please share the error message when you are unable to stop scan it, including trivy.log and core.log?
May be worth ignoring my last comment I think it was a knock on effect from the images stuck in queued, deleting the artifact stuck in queued, running a GC and uploading again resolved the upload issue for that image.
For the error message pressing the stop button when it was stuck in queued
I'm running on the v1.15.0 helm chart in a k8 cluster, when I trigger the stuck artifact scanjob to stop and cause the error there is nothing in the trivy/core logs
We had an issue were the bandwidth was slow and we had around 300 images queued for scanning, in that time trivy and job service pods restarted which is using ephemeral storage. Could this have caused the scanning issue stuck in queue? Is it possible that a force reset of a scan can be achieved even if it was just an API call.
Just an update to this, I could see an error in the logs for the sweep
2024-08-07T13:00:03Z [INFO] [/pkg/task/sweep_job.go:150]: [IMAGE_SCAN] start to sweep, retain latest 1 executions 2024-08-07T13:16:32Z [INFO] [/pkg/task/sweep_job.go:160]: [IMAGE_SCAN] listed 921 candidate executions for sweep | 2024-08-07T13:16:33Z [ERROR] [/pkg/task/sweep_job.go:110]: [IMAGE_SCAN] failed to run sweep, error: failed to delete executions: ERROR: update or delete on table "execution" violates foreign key constraint "task_execution_id_fkey" on table "task" (SQLSTATE 23503)
I have gone back to the image in my last comment and I can see it is now in an error state which allows me to run a scan again which has completed.
I poked around in the DB and I can see I have 1700 in mostly Error and a few Running
SELECT * FROM task WHERE execution_id IN (SELECT id FROM execution WHERE vendor_type = 'IMAGE_SCAN' AND status IN ('Running', 'Error', 'Pending'));
(1713 rows)
I have 21 scans stuck in Running which in the GUI shows as queued and these are not actually running in the job queue, looking at them in the gui I get the same as the screenshot above.
Output of a running example in the DB:
24096 | IMAGE_SCAN | 15688 | Running | | MANUAL | {"artifact":{"digest":"sha256:f6e30135a203881a0038f704aab515a664d9a9e786bc620f4918c9d0fb63f","id":15688,"project_id":273,"repository_name":"app06/martini/test-experience/test-experience-dr"},"operator":"robot+app06+ab ","registration":{"id":1,"name":"Trivy"}} | 2024-07-18 09:08:28.902456 | | 5 | 2024-07-18 16:35:40
Is it okay/possible to set the Running as Error?
Do you have any recommendations on cleaning this up please?
facing the exact same issue here.
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.
@zyyw the same issue here, please reopen
We have same issue. We can't start scan image, it has status "Queued" and I can't stop it for start again. Please reopen this issue.