BUG: DejaCode scan_single_package for previously failed scans results in bad request
Describe the bug When DejaCode is tasked with analyzing an SBOM it roughly performs two steps:
- Create a
load_sbompipeline in ScanCode.io and import the packages into the inventory - If the "Scan all packages of this product post-import" is enabled, submit
scan_single_packagefor each of the entries in the inventory
Due to unforseen circumstance it can happen that a scan_single_package pipeline fails in ScanCode.io. If one attempts to load another SBOM or rerun the scan through Action > Scan all packages, this results in Bad Request HTTP 400 responses. ScanCode.io rejects the requests to /api/projects with {"name":["project with this name already exists."]}.
It seems this can only be fixed by manually deleting the failed projects in ScanCode.io. For the end user is not clear why the package scan does not start and if only some are affected, it may even go entirely unnoticed resulting in incomplete data for the product in DejaCode.
To Reproduce Setup DejaCode to use a ScanCode.io instance.
Steps to reproduce the behavior:
- Create a test project in DejaCode
- Navigate to Action > Load Packages from SBOMs
- Select an SBOM and check "Update existing packages with discovered packages data" and "can all packages of this product post-import"
- Press "Load Packages"
- Interrupt the ScanCode.io worker in some way after
load_sbomhas completed e.g. terminate it, interrupt connection to DB - Either repeat steps 2-4 or use Action > Scan all packages
Observe in the logs that ScanCode.io complains about Bad Requests and pipeline not being restarted.
Expected behavior The expected behavior is that DejaCode would either restart the pipeline if it already exists or deletes and recreates it. Perhaps behavior could also be changed on ScanCode.io's side where a call to an existing project simply restarts the pipeline.
Screenshots Example requests and response (IPs, URLs and tokens replaced with dummy data)
POST /api/projects/ HTTP/1.0
X-Forwarded-For: 198.51.100.150, 198.51.100.166
X-Forwarded-Host: scancodeio.example.com:8080
X-Forwarded-Proto: http
Host: scancodeio.example.com
Connection: close
Content-Length: 380
X-Request-ID: 9c6363a1bd7f8bc19898eaee2d34e5a5
X-Real-IP: 198.51.100.150
X-Forwarded-Port: 443
X-Forwarded-Scheme: https
X-Scheme: https
User-Agent: python-requests/2.32.3
Accept-Encoding: gzip, deflate
Accept: */*
Authorization: Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: application/json
{"name": "f58d1b9466.cabd8d8bf5.ca4c555982", "input_urls": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz", "pipeline": "scan_single_package", "execute_now": true, "webhook_url": "https://dejacode.example.com/notifications/send_scan_notification/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:XXXXXX:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/"}
HTTP/1.0 400 Bad Request
Server: gunicorn
Date: Wed, 08 Jan 2025 08:14:47 GMT
Connection: close
Content-Type: application/json
Vary: Accept
Allow: GET, POST, HEAD, OPTIONS
X-Frame-Options: DENY
Content-Length: 51
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
Cross-Origin-Opener-Policy: same-origin
{"name":["project with this name already exists."]}
Context (OS, Browser, Device, etc.): n.a.
@ghsa-retrieval thanks very much for providing all the pertinent details.
It seems this even happens with successful scanned projects in ScanCode.io. This can cause problems when trying to fix a package import was seemingly not properly completed. For instance, in my test the usage policy was not assigned to some packages, but I could not scan the packages again to trigger the assignment.
Ideally, one should be able to pick whether to:
- Repeat the entire scan
- Only repeat the scan for previously failed pipelines and reimport for already successfully completed
The latter would be significantly more efficient than rerunning the entire analysis of all packages when there are only a handful of failures.
The expected behavior is that DejaCode would either restart the pipeline if it already exists or deletes and recreates it. > Perhaps behavior could also be changed on ScanCode.io's side where a call to an existing project simply restarts the pipeline.
@ghsa-retrieval The merged PR https://github.com/aboutcode-org/dejacode/pull/281 added a new section in the Product inventory tab to start and delete package scans. This should help to address this issue.
Thank you very much! I'll give it a try together with the other fix once I'm done with my other tasks. I will probably not have enough time to do so today, but definitely tomorrow.
Solution is perfect! Great UX, state is immediately visible and easy to use.