scancode.io
scancode.io copied to clipboard
PurlDB-requested package index scans can fail for larger packages with large scans and many scans at once
send_scan_project_results
in purldb-scan-queue-worker
isn't able to properly send the large scan results.
This issue fixed the critical problems:
- https://github.com/nexB/purldb/issues/362
And we can still improve this, using some of these approaches:
- [ ] Send compressed scan results: I was able to compress the ~350MB scan result to under 25MB using zstd or gzip ... xz is way too slow otherwise. For a start using the built in Gzip compression of HTTP clients and servers should help a lot.
- [ ] Adapt timeout to payload: The 60-second
DEFAULT_TIMEOUT
is not sufficient for sending scan results, we should have a variable timeout depending on the size of the scan results. - [ ] Consider paginated upload in multiple chunks or streamed upload using JSON lines.