scancode.io
scancode.io copied to clipboard
Job failed and there is no clear indication of what the problem is
Describe the bug Job failed and no clear indication what is the problem. I'm in debug mode and could not understand what is the issue from docker logs.
System configuration
-
Which version of ScanCode.io are you running? scancode.io: v34.0.0 scancode-toolkit: v32.0.8
-
Are you running the app using Docker? yes
-
On which OS? Linux - 22.04.1-Ubuntu
-
What inputs are you using? C++ project code with its 3rd parties modules.
-
Which pipeline are you running? scan_codebase
To Reproduce Steps to reproduce the behavior: Zipped all files to .tgz file, started a new pipeline scan_codebase (to detect licenses exist in my project).
I got failure twice
The file response.json from [MYSERVER]/api/projects/[uuid]/results/ are malformed as well.
Thanks for the report. What are the messages you get when you click on this? Have you tried to reset the project and rerun it?
@pombredanne Which messages you mean ? what to click on ? I tried several runs and the bug reproduces every time.
@RabeeaEgbareia Thanks... there is not much of interest there.
Is tcr.tgz
public code that you can share? is there something in the server logs that shows more details? is there a way to reproduce this on a smaller codebase?
Also can you reset the project (in settings) so that there is not twice the same pipeline when you start? Or maybe run it from the command line in non-async mode so we have more details?
@pombredanne tcr.tgz is a an internal code I cannot share. I do not know how to reproduce with another input. I noticed that worker container stopped unexpectedly ( Exited (0) ). I restarted it several times. What to search in the log for more details ? the log is big, this run took about 1 hour.
Another note: I created two different jobs with input tcr.tgz .. both failed I have another project which fails also (different input)
@pombredanne Do you have a stable version of scancode.io that I can rely on ? recommended one ? Can scancode work with large projects ? with performance ?
@RabeeaEgbareia I routinely scan large codebases (like 5 to 10GB) and I know of users scanning 30 to 50GB codebases. So it works there.
Here I am a bit at loss as I cannot reproduce this easily since your code is private.
- Is this an issue with a timeout?
- Can you find a way to reproduce the problem using open source code?
Alternatively you could also reach out to nexB for commercial support.
@pombredanne
- I checked the logs of my operation system and it seems memory issue - so I increased the memory from 8GB to 16GB.
- I wish I could find a simple way to reproduce or even give you my code, unfortunately I cannot.
Could you please give me an estimate of how many hours you think it could take to process 1GB of code files and jars ?
@RabeeaEgbareia I don't think that we can provide any estimate without more information about the contents of the 1GB codebase.
@RabeeaEgbareia I don't think that we can provide any estimate without more information about the contents of the 1GB codebase.
I second this comment ... this is really hard as there many parameters and we would need to get the specifics of your codebase. It could there is a few weird files that are bugging.
As a data point, for a codebase of 1.3GB with thousands of JARs and npms using a fairly comprehensive map_deploy_to_develop pipeline that also does matching to the PurlDB and a lot of work, it took about 8 hours on this machine https://www.hetzner.com/dedicated-rootserver/ax41-nvme/ ... but not really properly configured for speed.
For a docker pipeline with a 1GB image it took 58 minutes on the same machine.
A scan_single_package of a 244MB https://github.com/scummvm/scummvm/archive/refs/tags/v2.8.0.zip took 32 minutes.