i3lock-fancy icon indicating copy to clipboard operation
i3lock-fancy copied to clipboard

The script is slow

Open simlevesque opened this issue 9 years ago • 36 comments

Hi, I was wondering if it was just me or is the script really slow ?

simlevesque avatar Nov 23 '15 23:11 simlevesque

It uses ImageMagick's convert tool to modify images, twice at that. The script isn't asynchronous, because if out was you would have broken or missing components (some perhaps disabling the program altogether, at least from just a glance). Really the only thing you can do is deal with it being "slow" (although it could be much slower), or even find a different script to use.

Also, if this sounds rude, apologies, it isn't meant to.

Sweets avatar Nov 24 '15 00:11 Sweets

It takes about 2.5s on my netbook to run. I resize the image down, blur it and then resize back up, which helped, but yeah, it still takes a few seconds.

You can speed it up a bit more by changing the blur setting from -filter Gaussian -resize 50% -define filter:sigma=4.5 -resize 200% to -filter Gaussian -resize 50% -define filter:sigma=2.5 -resize 200% but then the background isn't as blurred.

You could also skip doing a blur altogether and pixelate the background instead with scale.

I might play around with resizing even smaller than 50%, perhaps 30% would be better, especially for larger screen sizes. But I am also open to suggestions to help improve the speed.

The speed issue is partly just the nature of the image editing that is happening. Try doing a blur on an image with gimp or using a filter in gimp to see, it just takes some time to do. For my use, I use xautolock to lock automatically after 5 min so I don't really notice the initial run speed as I'm generally away from the computer anyways.

meskarune avatar Nov 24 '15 00:11 meskarune

I just tested with 30% resize down and 333.33% resize up, and it didn't significantly improve the speed for me. It went from like 2.5s to 2.3s. Then I changed the sigma to 2.5 and the speed was like 2.2s. It's not really a huge improvement, but I could try with these settings if people don't mind the change in quality.

screenshots:

regular script

lock screen shot, slow

resize changed to 30% and sigma 2.5

lock screen shot improved

meskarune avatar Nov 24 '15 01:11 meskarune

@Coilest

"Really the only thing you can do is deal with it being "slow" (although it could be much slower), or even find a different script to use."

Or I could help you fix it... Maybe you like slow things, it's just not my thing.


Ok, here's where I'm at :

  • It's slow on my end because I have multiple monitors.
  • The crunching into a 1 pixel block looks like the most expensive operation.
  • The only part that needs to be crunched is the center monitor since it is the only one which will have text on it.
  • You need every monitors to be blurred, but you only need to crunch the middle screen so there should be two scrot (scrot is not an expensive operation).
  • From my observations, it seems like crunching a jpg is faster than crunching a png. The screen shot to be crunched should be in .jpg.
  • It's even faster if you set the quality to one (-q=1) for the jpg scrot since it's quicker to crunch a jpg which is lossy (that's how you make a thumbnail).

I'll try to send you a pull request ASAP !

simlevesque avatar Nov 24 '15 20:11 simlevesque

Some more :

  • You can't tell scrot to take a picture of a single monitor, you'd need another step and it would add time.
  • While reading the man page for scrot, I discovered a feature that could speed up the process : the -t argument which creates a thumbnail at the same time as it shoots the picture. The thumbnail can be crunched faster since it's already smaller.

simlevesque avatar Nov 24 '15 22:11 simlevesque

My best try at fixing the script takes one more second than meskarune/i3lock-fancy. 11 second for my version and 10 second for yours. Three monitors, Imagemagick with OpenCL & OpenMP.

By the way : if you wanna compare performance between various imagemagick operations, add the --bench argument.

simlevesque avatar Nov 24 '15 23:11 simlevesque

@simlevesque I was thinking of switching to maim and had a similar idea to yours. Basically just grab the center bit of the screen where the text will be to determine color instead of the whole screen. A 300px square area in the center would probably cover it. Maim can grab a selection of the screen as well as take screenshots.

https://github.com/naelstrof/maim

The downside is maim isn't as popularly available as scrot in various linux distros, but then i3lock-color isn't either I suppose.

Thanks for the tip about --bench it will come in useful :)

meskarune avatar Nov 25 '15 03:11 meskarune

I am using maim in my fork, so I can play around with --localize. I will need to figure out a way to get relative locations, though, instead of hard-coding for a certain screen resolution.

pid1 avatar Nov 28 '15 05:11 pid1

I use this in teiler: https://github.com/DaveDavenport/xininfo

carnager@caprica ~ > xininfo -mon-size
2560 1440
carnager@caprica ~ > xininfo -active-mon
0

carnager avatar Jan 11 '16 00:01 carnager

What if we would hide problem from the user by first applying some transitional background (greyish filter on top of screenshot for example) so we can show locked screen ASAP. And then do calculations in the background. And once finished switch to the "nice" background.

The biggest problem for me is: when I hit lock shortcut it seems like nothing happened, and after ~10s it locks down. I wouldn't care if it would not be that good looking while script renders background. As long as I know my system is already locked and I can leave my computer instead of waiting 10s to vertify it's locked.

What do you guys think?

insanebits avatar Jan 27 '16 22:01 insanebits

@insanebits could you try the current script and let me know how slow it is for you? I have changed some things since this issue was opened that have helped the speed for me significantly.

meskarune avatar Mar 23 '16 16:03 meskarune

Nice to hear about it. On my system latest version takes about ~2.5s which is acceptable. Awesome job! Will test it tomorrow on my work machine which is a bit older and see how it performs.

insanebits avatar Mar 23 '16 22:03 insanebits

Ok I made another change. Instead of taking an average color of the entire screen, I crop it to the center 100px. This has 2 advantages, one of being faster, and another of only using the center color to determind if it should be dark or light, so in situations like this:

screen shot of the lock with majority dark colors but light in the center

It does the right thing.

meskarune avatar Mar 25 '16 14:03 meskarune

Instead of using Gaussian filter, you can use directly the -blur option of imageMagick, which is recommended and explained here: http://www.imagemagick.org/Usage/blur/ You can try changing line 7 to EFFECT=(-resize 20% -blur 6x3 -resize 500.5%). The result is "more blurred" (acceptable in my opinion), but the time taken is smaller. In my case it decreased from 1.7s to 1.3s. It's said that the bigger (or more complex) the image is, the more significant the difference is. Tinker with the {radius}x{sigma} ratio and I think you will find the best solution for that (in my monitor the 6x3 seems good).

cer-nagas avatar Oct 14 '16 23:10 cer-nagas

Actually, one this note, my fork of i3lock will have a blurring option, so that'll be moved out of this script and will vastly speed up the locking. You can read a bit more into this in #57.

There'll be some interesting things to consider for implementation but I think the end result will be much, much better either way.

PandorasFox avatar Oct 15 '16 01:10 PandorasFox

Random idea:

Run convert processes in parallel by generating both images for dark and light backgrounds. So 3x resources for 2x duration, I guess. Also exit codes are likely lost, unless relying on bash 4.4.

AladW avatar Oct 15 '16 14:10 AladW

Relevant: https://www.reddit.com/r/i3wm/comments/5ag6y7/psa_if_youre_using_imagemagickgraphicsmagick_to/

Airblader avatar Nov 01 '16 08:11 Airblader

I've no idea yet how ffmpeg's blurring will compare to what's pending for my i3lock fork (I have a feeling like ffmpeg's code will be a bit more polished, but may have some more overhead, so it'll balance out), but what I'd like to eventually nail down would be overlaying an image over what's blurred, so that i3lock can grab the screen and lock it ASAP, rather than wait for an image to be blurred.

That'd usually lead to a loss of things like custom overlaid text, lock icons, etc. unless that's also brought into i3lock/an i3lock fork (hint: no), unless I go tinker with it some more to allow overlaying transparent images over the blurred screenshot. I've got some hopes for that, so I'll see how it goes when I have time to tinker and implement that.

Hopefully this'll enable i3lock-fancy to be a lot faster and more streamlined (as well as eliminate some dependencies, I think).

PandorasFox avatar Nov 01 '16 20:11 PandorasFox

No offense, but relying on a i3lock fork to do all of the heavy lifting is hardly a portable thing to do. At the very least ffmpeg should be kept as fallback.

AladW avatar Nov 01 '16 21:11 AladW

Perhaps, but if we're already grabbing the screen, it's pretty trivial to use xcb to just capture it and blur it. While ffmpeg can do gaussian blurring pretty quickly, I'll go do some timings but I think the timings will be fairly similar, and any performance loss in i3lock blurring will be offset by the fact that i3lock will be grabbing the screen and locking sooner, which is preferable, for me, at least.

i3lock already requires libxcb and uses cairo for drawing stuff, so I don't think that the blurring that @sebastian-frysztak is working on will introduce any more dependencies or make it less portable.

PandorasFox avatar Nov 01 '16 21:11 PandorasFox

@AladW: it will be portable. There are three blur implementations:

  • old, generic and slow - but at least it works everywhere
  • SSE2-optimized - with much better border handling than previous, and ~3..4 times faster. SSE2 is no news and everything since 2004 supports it, so this is effectively the default option.
  • SSSE3-optimized - should be even faster than SSE2, but needs some work.

We'll detect CPU's capabilities at runtime and use appropriate functions. There are no additional dependencies.

frysztak avatar Nov 01 '16 21:11 frysztak

Here's the timings from what I'm seeing with i3lock blurring vs ffmpeg:

arcana@archana:~/i3lock-color$ time (scrot in.png && ffmpeg -loglevel quiet -y -i in.png -vf "gblur=sigma=8" out.png && ./i3lock -i ./out.png)

real    0m0.380s
user    0m0.377s
sys     0m0.033s
arcana@archana:~/i3lock-color$ time ./i3lock -B

real    0m0.058s
user    0m0.043s
sys     0m0.007s

It may not be a perfect comparison, but handling the blurring in i3lock is much, much faster than anything external.

(The version of the blurring I used is the SSE2 version, I believe).

edit: blurring on my desktop with SSE3:

[arcana@archana i3lock-color]$ time ./i3lock -B

real    0m0.086s
user    0m0.047s
sys 0m0.010s
[arcana@archana i3lock-color]$ time (scrot in.png && ffmpeg -loglevel quiet -y -i in.png -vf "gblur=sigma=8" out.png && ./i3lock -i ./out.png)

real    0m0.777s
user    0m0.737s
sys 0m0.053s

I imagine that the main hit in performance (my CPU on my desktop is much stronger than my laptop CPU) comes from my desktop having ~4x as many pixels to work on. There's a ~9x speedup here compared to the ~7x speedup on my laptop, which suggests that this method is also better for larger resolutions.

PandorasFox avatar Nov 01 '16 21:11 PandorasFox

My point is that right now, you have some functionality if you use the i3-shipped i3lock, which is included with most Linux/BSD distributions. Moving everything into a fork means you have to do more than just fetch a shell script. Hence at minimum, there should be an ffmpeg fallback if i3lock -B is not available.

AladW avatar Nov 01 '16 21:11 AladW

Oh yeah, definitely, but the script has always used at least some of the features specific to a fork (primarily being the custom colors); originally, it relied on a fork that was ... 5 or 6 years out of date. I was kinda uncomfortable with using a fork from 5 or 6 years ago on my machine since that just seems like a bad idea, which is how I kinda came about to this. I figured as long as I was doing that, I may as well make it more customizable.

PandorasFox avatar Nov 01 '16 21:11 PandorasFox

There's blurring in the mainline of i3lock-color, so if you want to tentatively start using that, it should work fine.

So far for overlaying text/etc you'll have to pad out some images and then just use that as an image that's overlaid over the blur. I'll work on image offsets (and potentially multiple images, though implementing that will likely get messy) soonish, since that should make things easier for you (don't have to generate a large number of images to overlay for various resolutions), and potentially alignment flags (bottom middle, right middle, middle middle, etc).

Also potentially could do them per-monitor instead of on the screen as a whole. I'll probably be refactoring a lot of the image handling code if I do this.

PandorasFox avatar Feb 15 '17 16:02 PandorasFox

Maybe blurring could be done using Imlib2?

owenthewizard avatar Mar 16 '17 02:03 owenthewizard

I made a fork of scrot and added blurring and adding an icon in the code. Instead of 2-5 seconds, now it runs in about 150 - 500 milliseconds. You can check the code here darddan/scrot.

darddan avatar Feb 22 '18 01:02 darddan

darddan@, I didn't figure out how to use your fork with multiple displays.

If anyone is interested, here is what I come up with for two monitors. Works fast enough for me.

LOCK=$HOME/.i3/i3lock-fancy/lock.png
RES=$(xdpyinfo | grep dimensions | sed -r 's/^[^0-9]*([0-9]+x[0-9]+).*$/\1/')
IMAGE=$(mktemp).png

ffmpeg -probesize 100M -thread_queue_size 32 -f x11grab -video_size $RES \
  -y -i $DISPLAY -i $LOCK -i $LOCK -filter_complex \
  "eq=gamma=0.75,boxblur=3:3,overlay=(main_w-overlay_w)/4:(main_h-overlay_h)/2,overlay=3*(main_w-overlay_w)/4:(main_h-overlay_h)/2" \
  -vframes 1 $IMAGE

i3lock -n -i "$IMAGE"

AntonGitName avatar May 15 '18 22:05 AntonGitName

Two more suggestions:

1] Add an option to use a static image, that could be of a screenshot that the user manually takes, once. I implemented this in PR #124 ;

2] Convert the script from bash to dash. This will require replacing several bash idioms present throughout, such as the use of arrays. Here are two links discussing the speed difference, with attempts to benchmark it:

https://askubuntu.com/questions/1059474/are-there-concrete-figures-on-the-speed-of-bash-vs-dash

https://unix.stackexchange.com/questions/148035/is-dash-or-some-other-shell-faster-than-bash

Boruch-Baum avatar Aug 07 '18 13:08 Boruch-Baum

Re 2: Changing to dash will have minimal effect, it's convert et al that take the majority of compute time. See http://wiki.c2.com/?PrematureOptimization

AladW avatar Aug 07 '18 14:08 AladW