bimg icon indicating copy to clipboard operation
bimg copied to clipboard

Vips warnings and image artifacts at moderate concurrency

Open willtrking opened this issue 7 years ago • 19 comments

Hey there, been running into issues while working on a image server that uses bimg to perform jpeg -> webp conversion and scaling

At even moderate concurrency (20+ concurrent requests) in artificial tests, I've been getting tons of vips warning: VipsJpeg in my output, causing significant amounts of requests (50%+) to fail. If I don't reject images where bimg.Resize throws an error, I see significant artifacts in browser while tests are running.

The conversion funnel is simple, downloading an image from s3 (confirmed that warnings are NOT caused by incomplete downloads) and then converting the byte slices as follows

finalBuf, finalBufErr := bimg.Resize(respBuf, bimg.Options{
	Type:        bimg.WEBP,
	Width: 750,
	Quality:     40,
	Compression: 6,
})

I've also tried a simple jpeg scaling without webp conversion, and run into the same issue, code for that is simply

finalBuf, finalBufErr := bimg.Resize(respBuf, bimg.Options{
	Width: 750,
})

That converter is wrapped up in a simple fasthttp handler, no crazy logic yet as project is in early stages and I'm just testing out different image converters/resizers.

Server is running on an EC2 c4.8xlarge, so 36 "cores" and 60 GB of memory.

Should also be noted that this happens regardless of what I set runtime.GOMAXPROCS to, and I'm not fiddling with any of the vips concurrency settings.

Should also be noted that https://github.com/DAddYE/vips does not run into any issues, although is definitely slower (although hard to judge exactly given the error rate)

Love the library overall, been using it extensively in other areas, but the issues I see here make it difficult for me to use for this project.

EDIT:

Vips version info: vips-8.4.5-Fri Mar 10 07:17:47 UTC 2017

uname -a output: Linux ip-10-0-1-249 4.4.41-36.55.amzn1.x86_64 #1 SMP Wed Jan 18 01:03:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

EDIT EDIT:

Did more testing, discovered issue actually arises reliably at 20 concurrent requests (originally tested 50-100)

willtrking avatar Mar 10 '17 21:03 willtrking

Have you tried setting VIPS_CONCURRENCY environment variable to 50 or so? Not sure if it will help but worth trying.

phpb-com avatar Mar 11 '17 08:03 phpb-com

Tried it out, setting VIPS_CONCURRENCY to anything other then 1 produced the following message in addition to the original warning:

VipsRegion: valid clipped to nothing

willtrking avatar Mar 13 '17 20:03 willtrking

Unfortunately, this is something I've experienced in the past too. I'm still not sure the root of the issue (but I didn't dig into details yet). Any help here is always welcome!

h2non avatar Mar 13 '17 22:03 h2non

Hi @willtrking Could you provide a sample application which can reproduce the issue? warning: VipsJpeg might not be the root cause of failed requests according to some discussions about libvips.

  • https://github.com/lovell/sharp/issues/607
  • https://github.com/jcupitt/libvips/issues/544

touhonoob avatar Mar 14 '17 20:03 touhonoob

Hello, this should be fixed by https://github.com/jcupitt/libvips/issues/639 , which is released as libvips 8.5.4. I'm sorry this has been causing problems. There's a new thing in the test suite which hammers on exactly this error, so it shouldn't be able to regress again.

There was some stuff about this in the release notes for 8.5:

https://jcupitt.github.io/libvips/2017/03/16/What's-new-in-8.5.html#new-sequential-mode

There have been a couple of small problems with the new system, but it does seem to be working now, and it's a good improvement over the old mechanism.

jcupitt avatar Apr 25 '17 17:04 jcupitt

Hmm, 8.5.4 doesn't seem to fix it for me. I'm using the tarball release from https://github.com/jcupitt/libvips/releases. Still getting the following logs, as well as image artifacts.

(bimg:39): VIPS-WARNING **: 
(bimg:39): VIPS-WARNING **: read gave 68 warnings

When I serialize bimg (vips) with a simple mutex, there are no issues.

EDIT: Sorry about the close/reopen. Hit the wrong button

willtrking avatar Apr 26 '17 00:04 willtrking

hmm. I think I know how to fix it. pending analysis.

kirillDanshin avatar Apr 26 '17 13:04 kirillDanshin

@willtrking oh dear, then I guess it's a problem in the bimg layer. Could it be auto-rotate? You need to render via a memory buffer for this.

There's a page on how to resize with libvips:

https://github.com/jcupitt/libvips/wiki/HOWTO----Image-shrinking

for reference.

jcupitt avatar Apr 26 '17 13:04 jcupitt

@jcupitt Sorry about delay here, had to work on other projects for work so this took a back seat. Not doing any autorotate stuff (to my knowledge), to be specific I'm always running a crop/resize of a jpeg (the end width/height does vary, although the originating width/height is always within a few pixels of the same value) and converting the output to a lossy webp.

Turned out the mutex was probably hurting more then it was helping, so I've removed it.

Running now with libvips 8.5.5 , I've managed to tame down the amount of warnings I'm getting by disabling orc and setting VIPS_CONCURRENCY to my CPU count (in my case 8, I'm running a couple of AWS m4.2xlarge instances). Within the context of bimg I'm doing the following once when my server starts:

os.Setenv("VIPS_CONCURRENCY", strconv.Itoa(runtime.NumCPU()))
bimg.Initialize()

Adding os.Setenv("VIPS_TRACE", "true") seems to properly start tracing, so I'm assuming VIPS_CONCURRENCY is also properly being set.

I'm still getting the following warnings, although I haven't seen any artifacts:

(bimg:39): VIPS-WARNING **: read gave 10 warnings

(bimg:39): VIPS-WARNING **: 

(bimg:39): VIPS-WARNING **: read gave 2 warnings

(bimg:39): VIPS-WARNING **: 

When the server produces warnings as above, there's no guarantee that artifacts will be produced.

One thing worth mentioning is that I was incorrect in my initial issue, https://github.com/DAddYE/vips does seem to produce the same artifacts / warnings, although reproduction may be under different levels of concurrency from bimg.

Very willing to help contribute to a fix if indeed bimg is the culprit, although I'm at a bit of a loss in terms of starting points. Also if it helps I can send a VIPS_TRACE output. I think one of the most beneficial things would be to have an option that allows bimg to return an error if Resize causes a vips warning, which if there's interest I'd be willing to contribute toward. This way any warnings could either fail a request, or retry the Resize

willtrking avatar May 17 '17 08:05 willtrking

Hi again, libvips is being used at very high levels of concurrency without issue in several other projects, so while it could be a libvips issue I suspect it is probably something about the way bimg is using it. A retry on warning option seems to be ducking the problem: it would be better to fix it.

I'm afraid I've never used Go so I'm not clear about setting up a dev environment. If you could write a very brief idiot's guide for ubuntu and point me at a test-case for this issue, I'll have a go at reading the bimg source and looking for problems.

jcupitt avatar May 17 '17 09:05 jcupitt

Did anyone manage to resolve this? I'm getting image artifacts randomly on images I am resizing whenever I generate more than 1 at the same time. It seems to happen more often for high resolution large image.

vansante avatar Mar 25 '20 16:03 vansante

8.9.1 has a couple of races fixed, it'd be worth trying that if you're not on it.

jcupitt avatar Mar 25 '20 18:03 jcupitt

I just tried, but it didn't help unfortunately.

There seems to be a cutoff point after which the image is corrupted, for instance: image

It seems to be happening for images that have autorotated based on EXIF information. The portait images display the problem, but I never see it happening for landscape pictures.

vansante avatar Mar 26 '20 10:03 vansante

I had a quick look at the bimg source. This seems to be the line handling auto-rotate:

https://github.com/h2non/bimg/blob/master/resizer.go#L38

It's doing it at the start of processing, which is probably not optimal, I think. My suggestion would be to call the standard libvips vips_thumbnail operation instead, and add gamma and watermark handling after that.

jcupitt avatar Mar 26 '20 12:03 jcupitt

You seem to be right! I just created this MR: https://github.com/h2non/bimg/pull/324

And that seems to fix the issue (at first sight), except the portrait images now seem to lose (too much) resolution and end up fuzzy, so I'm doing something wrong.

vansante avatar Mar 26 '20 17:03 vansante

vips_thumbnail_image() can't exploit shrink on load (since the image has already been loaded). If you can, use vips_thumbnail_buffer() instead, it should be much quicker.

jcupitt avatar Mar 26 '20 17:03 jcupitt

@vansante hi, we've also seen this before - can you share how you're testing? I tried running with large concurrency and I am getting correct images (with 8.9.1). I'd like to double check

thanks!

urjitbhatia avatar Apr 28 '20 21:04 urjitbhatia

@urjitbhatia I basically had a large set (~50) of JPEG images of high resolution (something like 60 megapixel) which I unfortunately cannot share because they're from a client and thus confidential. I then had bimg generate 300x100px size thumbnails in something like 4 goroutines, after which in that set of 50 images there were always 1-5 images with corruption.

I solved it by now running on my own fork which does as @jcupitt suggested, which is to only execute vips_thumbnail_buffer() on an image. This unfortunately disables most of the rest of bimg's featureset, making it unlikely my MR will be merged, so I hope someone can mold it into something usable. Otherwise I might end up creating a smaller library for libvips that only does the thumbnailing with vips_thumbnail_buffer().

vansante avatar Apr 28 '20 21:04 vansante

@vansante thanks! that helps me test it better. I'll take a look at your MR and see if I can help thread it through the rest of the pipeline.

urjitbhatia avatar Apr 28 '20 22:04 urjitbhatia