harbor icon indicating copy to clipboard operation
harbor copied to clipboard

artifact scan stuck in queue status unable to re-scan

Open atchadwick opened this issue 1 year ago • 7 comments

Hi,

I'm using Harbor 2.11 and are some artifacts are stuck in a queue state for vuln scan by Trivy for a few days, when trying to stop the scan it comes up with the error no scan job for artifact. I have checked the job workers and there are no queued scans currently.

Due to this issue I am unable to stop the scan so that I can issue another one. Is it possible to force a stop somewhere, ideally if it can't find the scan job for the artifact it would reset the queue status so that it can be scanned again.

atchadwick avatar Jul 22 '24 10:07 atchadwick

Maybe related with https://github.com/goharbor/harbor/issues/19486.

chlins avatar Jul 24 '24 06:07 chlins

The artifacts that are stuck even when deleted and a GC is ran when trying to upload the same artifact with a different tag get the output of:

624033d3e11d: Mounted from app12378/wiremock c6e36f73d00a: Mounted from app12378/wiremock 501689fa7b84: Mounted from app12378/wiremock 37c89cb71f6f: Mounted from app12378/wiremock 0e0c99000308: Mounted from app12378/wiremock 08357c8b2220: Mounted from app12378/wiremock f4462d5b2da2: Mounted from app12378/wiremock

and the scan is still stuck in queued state

Could it be possible the the layer its trying to mount is stuck in a scan and therefore blocking this one also?

atchadwick avatar Jul 24 '24 20:07 atchadwick

Sorry, I can't fully understand your problem. Could you please elaborate it in more details?

Could you please share the error message when you are unable to stop scan it, including trivy.log and core.log?

zyyw avatar Jul 29 '24 06:07 zyyw

May be worth ignoring my last comment I think it was a knock on effect from the images stuck in queued, deleting the artifact stuck in queued, running a GC and uploading again resolved the upload issue for that image.

For the error message pressing the stop button when it was stuck in queued

image

I'm running on the v1.15.0 helm chart in a k8 cluster, when I trigger the stuck artifact scanjob to stop and cause the error there is nothing in the trivy/core logs

We had an issue were the bandwidth was slow and we had around 300 images queued for scanning, in that time trivy and job service pods restarted which is using ephemeral storage. Could this have caused the scanning issue stuck in queue? Is it possible that a force reset of a scan can be achieved even if it was just an API call.

atchadwick avatar Aug 04 '24 16:08 atchadwick

Just an update to this, I could see an error in the logs for the sweep

2024-08-07T13:00:03Z [INFO] [/pkg/task/sweep_job.go:150]: [IMAGE_SCAN] start to sweep, retain latest 1 executions 2024-08-07T13:16:32Z [INFO] [/pkg/task/sweep_job.go:160]: [IMAGE_SCAN] listed 921 candidate executions for sweep | 2024-08-07T13:16:33Z [ERROR] [/pkg/task/sweep_job.go:110]: [IMAGE_SCAN] failed to run sweep, error: failed to delete executions: ERROR: update or delete on table "execution" violates foreign key constraint "task_execution_id_fkey" on table "task" (SQLSTATE 23503)

I have gone back to the image in my last comment and I can see it is now in an error state which allows me to run a scan again which has completed.

I poked around in the DB and I can see I have 1700 in mostly Error and a few Running

SELECT * FROM task WHERE execution_id IN (SELECT id FROM execution WHERE vendor_type = 'IMAGE_SCAN' AND status IN ('Running', 'Error', 'Pending'));

(1713 rows)

I have 21 scans stuck in Running which in the GUI shows as queued and these are not actually running in the job queue, looking at them in the gui I get the same as the screenshot above.

Output of a running example in the DB:

24096 | IMAGE_SCAN | 15688 | Running | | MANUAL | {"artifact":{"digest":"sha256:f6e30135a203881a0038f704aab515a664d9a9e786bc620f4918c9d0fb63f","id":15688,"project_id":273,"repository_name":"app06/martini/test-experience/test-experience-dr"},"operator":"robot+app06+ab ","registration":{"id":1,"name":"Trivy"}} | 2024-07-18 09:08:28.902456 | | 5 | 2024-07-18 16:35:40

Is it okay/possible to set the Running as Error?

Do you have any recommendations on cleaning this up please?

atchadwick avatar Aug 07 '24 18:08 atchadwick

facing the exact same issue here.

snoop2048 avatar Aug 07 '24 18:08 snoop2048

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Oct 07 '24 09:10 github-actions[bot]

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

github-actions[bot] avatar Nov 07 '24 09:11 github-actions[bot]

@zyyw the same issue here, please reopen

StefanSa avatar Mar 02 '25 17:03 StefanSa

We have same issue. We can't start scan image, it has status "Queued" and I can't stop it for start again. Please reopen this issue.

maksptr avatar Jun 19 '25 04:06 maksptr