skopeo icon indicating copy to clipboard operation
skopeo copied to clipboard

Please provide a basic progress / transfer speed output ("CI compatible") on copy

Open frittentheke opened this issue 5 years ago • 33 comments

First of all, thank you massively for Skopeo and its ability to efficiently copy images between registries. Especially when having to use proxies and other special setups Skopeo is gold!

Just like @baracoder in issue https://github.com/containers/skopeo/issues/597, we are using Skopeo within Gitlab CI jobs.

While having Skopeo not print and update the otherwise nice progress indicators during a CI run is an improvement - not having any indication of the current transfer speed is an issue for our use case though. With copy being silent until it either succeeds or fails makes it hard to spot if there is a (speed) issue with the transfer.

In our case having some feedback on the transfer speed / likely success is crucial. May I kindly suggest to introduce a setting (or by checking for a tty to be present) to simply report the current speed / progress along with a timestamp in front about once a minute or every 30 seconds as a single line? This should not clutter ones output in CI and with the --quite option still available can still be switched off. This would also be nice for cron jobs to have some sort of "logging" of the progress maybe?

frittentheke avatar May 20 '19 09:05 frittentheke

Thanks for your report.

Reporting the recent transfer speed should be reasonably possible using c/image/copy.Options.Progress; reporting overall progress is not currently possible through that interface (it does not report about all the involved blobs, and their sizes, in advance). Enhancing the progress reporting interfaces of c/image/copy to make this possible (or, ideally, to implement the current / WIP progress bar code in c/image/copy on top of that generic interface) would be nice, of course.

mtrmac avatar May 20 '19 19:05 mtrmac

Thanks for the quick response. The issue I am having is to see the current progress (transfer speed per second) rather than the total process. In short: I simply want to see how quickly things are progressing currently.

frittentheke avatar Jun 25 '19 11:06 frittentheke

This got notably more complex in the meantime, because up to 6 blobs are now copied simultaneously; so, the concept of “current progress” does not make sense any more, without some sort of aggregated view.

mtrmac avatar Jun 25 '19 13:06 mtrmac

@mtrmac urgh, too bad :-( Thanks you keeping me / this issue updated.

As cool as it is when Skopeo does sync / copy multiple gigabytes of Docker layers in seconds on a fast local network, when you are dealing with potentially slow registries which are accessed via the internet across half the globe any indication if things are still moving or (almost) stalled would help.

frittentheke avatar Jun 26 '19 11:06 frittentheke

@vrothberg Can we use the progress bars that we use in Podman for this?

rhatdan avatar Oct 08 '20 14:10 rhatdan

@vrothberg Can we use the progress bars that we use in Podman for this?

Yes, that's possible. There is an example on GitHub (https://github.com/vbauerster/mpb#bytes-counters) that indicates the download speed.

I am currently busy with other things but the change should be straight forward. We need to change the decorators in createProgressBar -> https://github.com/containers/image/blob/master/copy/copy.go#L982.

vrothberg avatar Oct 09 '20 09:10 vrothberg

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jun 06 '21 00:06 github-actions[bot]

unstale

frittentheke avatar Jun 06 '21 21:06 frittentheke

podman (master) $ ./bin/podman rmi -af; ./bin/podman pull docker.io/gcc                           
Trying to pull docker.io/library/gcc:latest...                                                    
Getting image source signatures                                                                   
Copying blob ad4592a9cb6d 9.00 MiB/s [===========================>----------] 39.1MiB / 52.4MiB   
Copying blob 5a2f691668eb 1.12 MiB/s [--------------------------------------] 1.9MiB / 187.3MiB   
Copying blob 7faeec18cdc0 done                                                                    
Copying blob 0ddda9701dd9 1.20 MiB/s [=========>----------------------------] 2.6MiB / 10.4MiB    
Copying blob a6e37b3a94cd 1.72 MiB/s [=>------------------------------------] 3.2MiB / 52.0MiB    
Copying blob f0031fb5d71f 1.48 MiB/s [==========================>-----------] 3.4MiB / 4.9MiB     
Copying blob 8f2ca4ed8981 8.06 MiB/s [===>----------------------------------] 11.5MiB / 121.3MiB  

As already mentioned by @mtrmac, copying layers is happening in parallel which makes it challenging to have a single indicator of the download speed.

But I want to revive the conversation on how we could get closer. @frittentheke, would the upper example be of any help? Each layer would have the IO printed before the progress bar.

vrothberg avatar Jun 07 '21 08:06 vrothberg

@vrothberg absolutely would this help, even be more detailed than a summed up data rate.

Just please also consider the usage of Skopeo in CI pipelines which do not like constant updates to the same few lines all the time. Maybe an option to refresh the output only so often is sensible here?

frittentheke avatar Jun 07 '21 12:06 frittentheke

As already mentioned by @mtrmac, copying layers is happening in parallel which makes it challenging to have a single indicator of the download speed.

It seems possible in principle to sum the speeds of the individual items; of course actually doing that, in concurrent code, and figuring out the relevant heuristics (smoothing / moving averages, and making sure the data for all streams covers the same time range, so that if 6 items “take turns” on a 100 MB/s link, 5 report 0 speed and 1 reports 100 MB/s at the time it is receiving data, we don’t sum that up to 600 MB/s) might end up pretty complex.

But we are getting a bit into the weeds… do I understand correctly that the core need is to:

  • Not spam the log, but
  • Report download speed sometimes (per 30-60 seconds), to eventually detect very slow network transfers?

and it’s not very important what the actual data reported is, beyond the two concerns above?

mtrmac avatar Jun 07 '21 13:06 mtrmac

But we are getting a bit into the weeds… do I understand correctly that the core need is to:

* Not spam the log, but
* Report download speed _sometimes_ (per 30-60 seconds), to _eventually_ detect very slow network transfers?

and it’s not very important what the actual data reported is, beyond the two concerns above?

Yes. Maybe continue to print the single line that was introduced with https://github.com/containers/image/pull/558 and just add some progress info like x of y Megabytes or z MB/s whatever` and then repeat the line every 30 or 60 seconds.

frittentheke avatar Jun 07 '21 23:06 frittentheke

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jul 08 '21 00:07 github-actions[bot]

This is not stale - was just talking to @vrothberg and @mtrmac about this ;-)

frittentheke avatar Jul 08 '21 08:07 frittentheke

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Aug 08 '21 00:08 github-actions[bot]

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Sep 08 '21 00:09 github-actions[bot]

@mtrmac @vrothberg @frittentheke Any movement on this?

rhatdan avatar Sep 08 '21 16:09 rhatdan

Not to my knowledge. To summarize the above conversation:

  • Skopeo (or c/image) should print at a given frequency the network IO
  • When there's no TTY available (e.g., in CI), the IO should be printed as a single line
  • When there's a TTY, we can add a new decorator to each progress bar

@mtrmac @frittentheke does that sound right to you?

vrothberg avatar Sep 09 '21 06:09 vrothberg

The progress bars already include (per-layer) speed, so that’s a no-op.

I understand this as an opt-in, periodic, report, only in the non-interactive case.

mtrmac avatar Sep 09 '21 19:09 mtrmac

The progress bars already include (per-layer) speed, so that’s a no-op.

I don't think it's enough (see https://github.com/containers/skopeo/issues/658#issuecomment-855733116).

Currently, we don't display the exact IO Copying blob a330b6cecb98 [=====>--------------------------------] 4.3MiB / 25.9MiB

I'd loved to add another decorator indicating the IO Copying blob ad4592a9cb6d 9.00 MiB/s [===========================>----------] 39.1MiB / 52.4MiB

vrothberg avatar Sep 10 '21 09:09 vrothberg

You’re right, my mistake.

mtrmac avatar Sep 10 '21 10:09 mtrmac

Compare #1477 ; it’s not quite the same thing but it might need computing similar data.

mtrmac avatar Oct 07 '21 12:10 mtrmac

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Nov 07 '21 00:11 github-actions[bot]

@vrothberg I hope you don't mind me keeping this from staling out ....

frittentheke avatar Nov 07 '21 18:11 frittentheke

We don't just close stale issues, We use it as an opportunity to take a fresh look.

rhatdan avatar Nov 08 '21 21:11 rhatdan

It would be super useful to have some simple progress reporting that could be captured from stdout or stderr to provide some progress indications.

We use skopeo in ScanCode.io to fetch images and it would help a lot to report some progress when we have larger images. For FWIW our code is at: https://github.com/nexB/scancode.io/blob/00bf2545436ebcfc5e94f45f9a29a4b2abfe2131/scanpipe/pipes/fetch.py#L92 and is a CLI wrapper using "docker://" URLs as inputs which are accepted in the UI of scancode.io to fetch and scan whole docker images for origin, license and more.

See also https://github.com/nexB/scancode.io/issues/372

pombredanne avatar Nov 30 '21 07:11 pombredanne

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Dec 31 '21 00:12 github-actions[bot]

Gentle ping... I am still looking forward to this!

pombredanne avatar Jan 21 '22 16:01 pombredanne

A friendly reminder that this issue had no activity for 30 days.

This should not stale away .... @rhatdan @vrothberg

frittentheke avatar Jan 29 '22 22:01 frittentheke

Note that I have made some crude tests using script to pretend we are running interactively:

$ script --return  --flush -c "./skopeo copy  --insecure-policy docker://debian docker-archive:foo6.tar" -a log.txt

This get us some output from mpb with escape sequences:

Script started on Sat 22 Jan 2022 07:43:55 PM CET
Getting image source signatures
Copying blob 0e29546d541c [--------------------------------------] 367.2KiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [>-------------------------------------] 1.2MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [>-------------------------------------] 1.9MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [=>------------------------------------] 2.7MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [==>-----------------------------------] 3.6MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [==>-----------------------------------] 4.0MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [==>-----------------------------------] 4.4MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [===>----------------------------------] 5.6MiB / 52.4MiB
ESC[1AESC[JCopying blob 0e29546d541c [====>---------------------------------] 6.4MiB / 52.4MiB

Then using strings to filter out the escape sequences yields something of sorts. This is warty and brittle to get this this way... but if we could this out of the box without relying on script and strings that would be perfectly good enough for me:

Script started on Sat 22 Jan 2022 07:43:55 PM CET
Getting image source signatures
Copying blob 0e29546d541c [--------------------------------------] 367.2KiB / 52.4MiB
[JCopying blob 0e29546d541c [>-------------------------------------] 1.2MiB / 52.4MiB
[JCopying blob 0e29546d541c [>-------------------------------------] 1.9MiB / 52.4MiB
[JCopying blob 0e29546d541c [=>------------------------------------] 2.7MiB / 52.4MiB
[JCopying blob 0e29546d541c [==>-----------------------------------] 3.6MiB / 52.4MiB
[JCopying blob 0e29546d541c [==>-----------------------------------] 4.0MiB / 52.4MiB
[JCopying blob 0e29546d541c [==>-----------------------------------] 4.4MiB / 52.4MiB
[JCopying blob 0e29546d541c [===>----------------------------------] 5.6MiB / 52.4MiB
[JCopying blob 0e29546d541c [====>---------------------------------] 6.4MiB / 52.4MiB
[JCopying blob 0e29546d541c [====>---------------------------------] 7.2MiB / 52.4MiB
[JCopying blob 0e29546d541c [=====>--------------------------------] 7.9MiB / 52.4MiB

@vbauerster this project uses your excellent mpb https://github.com/vbauerster/mpb :bow: .... would there be a way to get progress provided optionally in a non-interactive mode without terminal escape sequence decoration?

@ cco3 FYI, this is the underlying issue making it hard(er?) to report progress when fetching images in https://github.com/nexB/scancode.io/issues/372

pombredanne avatar Jan 30 '22 09:01 pombredanne