scancode.io icon indicating copy to clipboard operation
scancode.io copied to clipboard

Option to not store images

Open cco3 opened this issue 4 years ago • 15 comments

Maybe this already exists, but could there be an option to not store images after a scan completes?

cco3 avatar Apr 19 '21 21:04 cco3

@cco3 do you mean to not store the images in the database as codebase resource (not scanned)? Or to not keep those on the file system (project work directory) after a scan?

You would need this option on a per project basis or for a whole ScanCode.io instance?

tdruez avatar Apr 20 '21 14:04 tdruez

I would like to store nothing more than the report and associated metadata. Ideally, the only thing that would need to be persistent would be the DB.

cco3 avatar Apr 20 '21 18:04 cco3

Furthermore, I'm concerned that even with a large disk, we will fill up local storage with images and the tool will just stop working unless we add some sophisticated way of handling it.

cco3 avatar May 05 '21 21:05 cco3

https://scancodeio.readthedocs.io/en/latest/scanpipe-concepts.html#project-workspace

To be sure I understand properly, you would like to remove the content of both the input/ (input files as uploaded/downloaded) and the codebase/ (extracted content) directories?

tdruez avatar May 06 '21 14:05 tdruez

Correct. I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?

cco3 avatar May 06 '21 19:05 cco3

I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)

cco3 avatar May 06 '21 19:05 cco3

I don't suppose there's a way to use SCANCODEIO_WORKSPACE_LOCATION to accomplish what I want, is there?

cco3 avatar May 27 '21 20:05 cco3

I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?

Generated reports are stored on the disk (in the output/ project directory), but those can be regenerated anytime form the DB data. When you click on a "Download" link in the UI, a fresh reports is generated and sent.

I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)

You can specified the location of the workspace using the SCANCODEIO_WORKSPACE_LOCATION setting https://scancodeio.readthedocs.io/en/latest/scancodeio-settings.html#scancodeio-workspace-location, as long as it's a mounted location of the filesystem. We could add remote storage support in the future.

In the short term, you can wipe the content of your workspace location (available in the header of any project details view using the web UI).

We will add automated ways to run those cleanup.

tdruez avatar May 28 '21 15:05 tdruez

Thanks! Is this the behavior on the current release or the next one?

cco3 avatar May 29 '21 02:05 cco3

The SCANCODEIO_WORKSPACE_LOCATION setting and system has been around for a while.

tdruez avatar May 31 '21 06:05 tdruez

I meant the behavior to regenerate a report when it's no longer on disk. That hasn't worked for me with the current release.

cco3 avatar Jun 01 '21 18:06 cco3

@cco3 which reporting format are you using, json or xlsx?

tdruez avatar Jun 01 '21 18:06 tdruez

I think we are going to end up primarily using xlsx.

cco3 avatar Jun 01 '21 18:06 cco3

See also https://github.com/nexB/scancode.io/issues/356

pombredanne avatar Apr 15 '22 08:04 pombredanne

@cco3 Since we now have the option to archive a project with #205 and there is a related issue to use external storage with #356 how do you see this issue evolving? is this still relevant in this context?

pombredanne avatar Apr 15 '22 08:04 pombredanne