scancode.io icon indicating copy to clipboard operation
scancode.io copied to clipboard

PurlDB-requested package index scans can fail for larger packages with large scans and many scans at once

Open keshav-space opened this issue 10 months ago • 0 comments

send_scan_project_results in purldb-scan-queue-worker isn't able to properly send the large scan results.

This issue fixed the critical problems:

  • https://github.com/nexB/purldb/issues/362

And we can still improve this, using some of these approaches:

  • [ ] Send compressed scan results: I was able to compress the ~350MB scan result to under 25MB using zstd or gzip ... xz is way too slow otherwise. For a start using the built in Gzip compression of HTTP clients and servers should help a lot.
  • [ ] Adapt timeout to payload: The 60-second DEFAULT_TIMEOUT is not sufficient for sending scan results, we should have a variable timeout depending on the size of the scan results.
  • [ ] Consider paginated upload in multiple chunks or streamed upload using JSON lines.

keshav-space avatar Apr 01 '24 14:04 keshav-space