dejacode icon indicating copy to clipboard operation
dejacode copied to clipboard

BUG: DejaCode scan_single_package for previously failed scans results in bad request

Open rogu-beta opened this issue 11 months ago • 6 comments

Describe the bug When DejaCode is tasked with analyzing an SBOM it roughly performs two steps:

  1. Create a load_sbom pipeline in ScanCode.io and import the packages into the inventory
  2. If the "Scan all packages of this product post-import" is enabled, submit scan_single_package for each of the entries in the inventory

Due to unforseen circumstance it can happen that a scan_single_package pipeline fails in ScanCode.io. If one attempts to load another SBOM or rerun the scan through Action > Scan all packages, this results in Bad Request HTTP 400 responses. ScanCode.io rejects the requests to /api/projects with {"name":["project with this name already exists."]}.

It seems this can only be fixed by manually deleting the failed projects in ScanCode.io. For the end user is not clear why the package scan does not start and if only some are affected, it may even go entirely unnoticed resulting in incomplete data for the product in DejaCode.

To Reproduce Setup DejaCode to use a ScanCode.io instance.

Steps to reproduce the behavior:

  1. Create a test project in DejaCode
  2. Navigate to Action > Load Packages from SBOMs
  3. Select an SBOM and check "Update existing packages with discovered packages data" and "can all packages of this product post-import"
  4. Press "Load Packages"
  5. Interrupt the ScanCode.io worker in some way after load_sbom has completed e.g. terminate it, interrupt connection to DB
  6. Either repeat steps 2-4 or use Action > Scan all packages

Observe in the logs that ScanCode.io complains about Bad Requests and pipeline not being restarted.

Expected behavior The expected behavior is that DejaCode would either restart the pipeline if it already exists or deletes and recreates it. Perhaps behavior could also be changed on ScanCode.io's side where a call to an existing project simply restarts the pipeline.

Screenshots Example requests and response (IPs, URLs and tokens replaced with dummy data)

POST /api/projects/ HTTP/1.0
X-Forwarded-For: 198.51.100.150, 198.51.100.166
X-Forwarded-Host: scancodeio.example.com:8080
X-Forwarded-Proto: http
Host: scancodeio.example.com
Connection: close
Content-Length: 380
X-Request-ID: 9c6363a1bd7f8bc19898eaee2d34e5a5
X-Real-IP: 198.51.100.150
X-Forwarded-Port: 443
X-Forwarded-Scheme: https
X-Scheme: https
User-Agent: python-requests/2.32.3
Accept-Encoding: gzip, deflate
Accept: */*
Authorization: Token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: application/json

{"name": "f58d1b9466.cabd8d8bf5.ca4c555982", "input_urls": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz", "pipeline": "scan_single_package", "execute_now": true, "webhook_url": "https://dejacode.example.com/notifications/send_scan_notification/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:XXXXXX:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/"}

HTTP/1.0 400 Bad Request
Server: gunicorn
Date: Wed, 08 Jan 2025 08:14:47 GMT
Connection: close
Content-Type: application/json
Vary: Accept
Allow: GET, POST, HEAD, OPTIONS
X-Frame-Options: DENY
Content-Length: 51
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
Cross-Origin-Opener-Policy: same-origin

{"name":["project with this name already exists."]}

Context (OS, Browser, Device, etc.): n.a.

rogu-beta avatar Jan 08 '25 12:01 rogu-beta

@ghsa-retrieval thanks very much for providing all the pertinent details.

DennisClark avatar Jan 08 '25 18:01 DennisClark

It seems this even happens with successful scanned projects in ScanCode.io. This can cause problems when trying to fix a package import was seemingly not properly completed. For instance, in my test the usage policy was not assigned to some packages, but I could not scan the packages again to trigger the assignment.

rogu-beta avatar Feb 07 '25 13:02 rogu-beta

Ideally, one should be able to pick whether to:

  • Repeat the entire scan
  • Only repeat the scan for previously failed pipelines and reimport for already successfully completed

The latter would be significantly more efficient than rerunning the entire analysis of all packages when there are only a handful of failures.

rogu-beta avatar Feb 10 '25 11:02 rogu-beta

The expected behavior is that DejaCode would either restart the pipeline if it already exists or deletes and recreates it. > Perhaps behavior could also be changed on ScanCode.io's side where a call to an existing project simply restarts the pipeline.

@ghsa-retrieval The merged PR https://github.com/aboutcode-org/dejacode/pull/281 added a new section in the Product inventory tab to start and delete package scans. This should help to address this issue.

tdruez avatar May 13 '25 13:05 tdruez

Thank you very much! I'll give it a try together with the other fix once I'm done with my other tasks. I will probably not have enough time to do so today, but definitely tomorrow.

rogu-beta avatar May 13 '25 13:05 rogu-beta

Solution is perfect! Great UX, state is immediately visible and easy to use.

rogu-beta avatar May 14 '25 07:05 rogu-beta