scancode.io
scancode.io copied to clipboard
Option to not store images
Maybe this already exists, but could there be an option to not store images after a scan completes?
@cco3 do you mean to not store the images in the database as codebase resource (not scanned)? Or to not keep those on the file system (project work directory) after a scan?
You would need this option on a per project basis or for a whole ScanCode.io instance?
I would like to store nothing more than the report and associated metadata. Ideally, the only thing that would need to be persistent would be the DB.
Furthermore, I'm concerned that even with a large disk, we will fill up local storage with images and the tool will just stop working unless we add some sophisticated way of handling it.
https://scancodeio.readthedocs.io/en/latest/scanpipe-concepts.html#project-workspace
To be sure I understand properly, you would like to remove the content of both the input/ (input files as uploaded/downloaded) and the codebase/ (extracted content) directories?
Correct. I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?
I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)
I don't suppose there's a way to use SCANCODEIO_WORKSPACE_LOCATION to accomplish what I want, is there?
I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?
Generated reports are stored on the disk (in the output/ project directory), but those can be regenerated anytime form the DB data. When you click on a "Download" link in the UI, a fresh reports is generated and sent.
I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)
You can specified the location of the workspace using the SCANCODEIO_WORKSPACE_LOCATION setting https://scancodeio.readthedocs.io/en/latest/scancodeio-settings.html#scancodeio-workspace-location, as long as it's a mounted location of the filesystem. We could add remote storage support in the future.
In the short term, you can wipe the content of your workspace location (available in the header of any project details view using the web UI).
We will add automated ways to run those cleanup.
Thanks! Is this the behavior on the current release or the next one?
The SCANCODEIO_WORKSPACE_LOCATION setting and system has been around for a while.
I meant the behavior to regenerate a report when it's no longer on disk. That hasn't worked for me with the current release.
@cco3 which reporting format are you using, json or xlsx?
I think we are going to end up primarily using xlsx.
See also https://github.com/nexB/scancode.io/issues/356
@cco3 Since we now have the option to archive a project with #205 and there is a related issue to use external storage with #356 how do you see this issue evolving? is this still relevant in this context?