Inconsistent Behavior When Triggering Trivy Scan via Harbor API
Expected behavior and actual behavior: Expected behavior: When sending a request to initialize a Trivy scan of an artifact through the Harbor API, I expect the scan to consistently either succeed or fail, as the configuration of the artifact and Trivy scanner do not change between requests.
Actual behavior: Some attempts succeed, while others return a 400 Bad Request error with the following message:
The configured scanner Trivy does not support scanning artifact with mime type application/vnd.docker.distribution.manifest.v2+json
While the inconsistency might be an issue with our configuration and environment, at least the error message cant be correct. Screenshots showing both successful and failed requests are attached to this issue.
Steps to reproduce the problem:
- Set up Harbor registry (v2.10.2) and Trivy (goharbor/trivy-adapter-photon:v2 .10.2) in a Kubernetes cluster.
- Attempt to initialize a scan of an artifact via the Harbor API by sending a POST request to the /scan endpoint.
- Observe the inconsistent behavior.
Versions:
- Harbor version: v2.10.2-1a741cb7 (From Image goharbor/registry-photon:v2.10.2)
- Trivy Image goharbor/trivy-adapter-photon:v2.10.2
- Kubenetes: v1.30.3
Additional context:
409 Response
202 Response
@kon-foo Could you please reproduce the issue (both success and failure in scan) and collect the logs of nginx, harbor-core, harbor-jobservice and trivy-adapter pods?
Could you please also let me know how Harbor was deployed in your env? What makes you feel it may be an issue with your configuration and environment?
@reasonerjt Thanks for looking into this. Harbor was deployed using this helm chart. These are the images in use:
| Component | Image |
|---|---|
| harbor-core | goharbor/harbor-core:v2.10.2 |
| harbor-database | goharbor/harbor-db:v2.10.2 |
| harbor-jobservice | goharbor/harbor-jobservice:v2.10.2 |
| harbor-portal | goharbor/harbor-portal:v2.10.2 |
| harbor-redis | goharbor/redis-photon:v2.10.2 |
| harbor-registry | goharbor/registry-photon:v2.10.2 & goharbor/harbor-registryctl:v2.10.2 |
| harbor-trivy | goharbor/trivy-adapter-photon:v2.10.2 |
Here are the logs:
- harbor-core.log
- harbor-jobservice.log
- nginx2.log
- Trivy didn't log anything.
This time I actually had to hit the API ~40 times before getting a 202. Core fails to ping the scanner 39 times:
2024-10-01T05:34:15Z [ERROR] [/controller/scanner/base_controller.go:299][error="v1 client: get metadata: Get "http://release-registry-harbor-trivy:8080/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" requestID="27fa31aade1ae4e2a9e422a198fd0544"]: failed to ping scanner
2024-10-01T05:34:15Z [ERROR] [/controller/scanner/base_controller.go:265]: api controller: get project scanner: scanner controller: ping: v1 client: get metadata: Get "http://release-registry-harbor-trivy:8080/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Before finally succeeding:
2024-10-01T05:35:45Z [INFO] [/server/middleware/security/robot.go:71][requestID="9553b0d0-4924-4218-93fc-e4d8891358f3"]: a robot security context generated for request GET /service/token
2024-10-01T05:35:53Z [INFO] [/pkg/task/dao/execution.go:471]: scanned out 1 executions with outdate status, refresh status to db
2024-10-01T05:35:53Z [INFO] [/pkg/task/dao/execution.go:512]: refresh outdate execution status done, 1 succeed, 0 failed
What makes you feel it may be an issue with your configuration and environment?
I just added this to emphasize that even if it had something to do with our conf/env, I would consider this unwanted behavior, because the mime-type is not the problem and the error message is misleading. While it wasn't me who deployed harbor in our cluster, I am unaware of any unusual configurations, but the failing scanner pings make me feel like it could be a networking or permissions misconfiguration.
Thanks for your help and let me know if you need further information.
I had some time to dig deeper and was able to locate the issue.
First of all the "inconsistency" stems from our trivy container sometimes not answering within the hardcoded 5s timeout of the rest clients. So that's not on Harbor.
The actual bug here is how these timeouts are handled in GetRegistrationByProject and Scan
if opts.Ping {
// Get metadata of the configured registration
meta, err := bc.Ping(ctx, registration)
if err != nil {
// Not blocked, just logged it
log.Error(errors.Wrap(err, "api controller: get project scanner"))
registration.Health = statusUnhealthy
} else {
...
}
}
return registration, nil
In case of an error in bc.Ping it just gets logged, the registration is marked unhealthy and is returned with an empty Metadata.
The Scan method however only checks for the existence of a registration and not for its healthiness and therefore proceeds to compare the Artifacts mime-type against an empty Metadata object, throwing a confusing error.
Fix
Assuming there is a reason for only setting the registration.Health = statusUnhealthy instead of throwing the error. The easy fix would be to check for the healthiness in Scan. I could create a PR for that if you want me to @reasonerjt?
However, it might be worth it to reconsider if swallowing this error is a good idea and to check if all the calls of GetRegistrationByProject handle registration.Health correctly. I am not familiar enough with harbor or go to do so.
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.