scancode.io
scancode.io copied to clipboard
Progress on image download
The web interface shows progress on image upload (thanks!).
However, it does not show any progress if you specify a URI to download, which for larger images could be quite useful.
@cco3 yes! progress reporting is a difficult matter, especially here where we eventually have multiple ways to fetch things using multiple libraries. For container and docker registries, e.g., "docker://" URLs we use skopeo, which itself does not report progress yet. (See https://github.com/containers/skopeo/issues/658#issuecomment-982345744 )
Otherwise, for HTTP URLs we use mostly Python requests that should be able to provide some progress reporting by iterating through the content stream. https://towardsdatascience.com/how-to-download-files-using-python-part-2-19b95be4cdb5 has a nice article on streaming in requests.
A tool or library-independent alternative could be to track progress with files being written, but tools usually do not write progressively except to some opaque temp file, making this less practical and harder or not possible.
One idea would be to delegate the actual fetching of inputs to the pipeline execution task. Project would be instantly created and the download part would happen at the start of the pipeline task, as a pre-step, to ensure all the project inputs are available.
Note that we now have a spinner in the web UI when the upload takes place for start. But that's not what you are looking for exactly @cco3 , right?
One idea would be to delegate the actual fetching of inputs to the pipeline execution task.
I am inclined to go with this. There is always a risk that the time of creation/check and the time of download/use creates some small TOCTOU issue but that may be minor if understood. One way to mitigate this would be to make the background download optional (in the UI, API and in the CLI).
We would still have an issue of progress reporting on docker:// URLs until there is upstream support
A spinner doesn't tell me how much progress has been made. Given the size of these images, something more informative would be helpful.
The solution is related to https://github.com/nexB/scancode.io/issues/410
The download of the Project's inputs now takes place in the pipeline run.