lua-vips icon indicating copy to clipboard operation
lua-vips copied to clipboard

Luaffi callback causes the process to crash

Open zhouming opened this issue 7 years ago • 7 comments

GDB bt

#0  0x00007f7f1bc96ae1 in lj_alloc_free (msp=0x40ebc010, ptr=0x410fff88)
    at lj_alloc.c:1376
#1  0x00007f7f1bc4ec6d in gc_sweep (g=g@entry=0x40ebc3b8, p=0x4020bcdc,
    lim=4294967294, lim@entry=4294967295) at lj_gc.c:406
#2  0x00007f7f1bc4f8a0 in gc_onestep (L=L@entry=0x41e64600) at lj_gc.c:628
#3  0x00007f7f1bc4fd4c in lj_gc_step (L=L@entry=0x41e64600) at lj_gc.c:689
#4  0x00007f7f1bc8ed85 in callback_conv_args (L=0x41e64600, cts=0x40eceec0)
    at lj_ccallback.c:619
#5  lj_ccallback_enter (cts=0x40eceec0, cf=<optimized out>)
    at lj_ccallback.c:687
#6  0x00007f7f1bc4eaf6 in lj_vm_ffi_callback ()
   from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#7  0x00007f7e5acea42a in vips_argument_map (object=0x7f7db0001a00,
    fn=0x7f7f1c731008, a=0x0, b=0x0) at object.c:581
#8  0x00007f7f1bc4ebd4 in lj_vm_ffi_call ()
   from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#9  0x00007f7f1bc8e3d4 in lj_ccall_func (L=L@entry=0x41e64600,
    cd=<optimized out>) at lj_ccall.c:1161
#10 0x00007f7f1bca24a6 in lj_cf_ffi_meta___call (L=0x41e64600) at lib_ffi.c:230
#11 0x00007f7f1bc4cae3 in lj_BC_FUNCC ()
   from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#12 0x00000000004e9279 in ngx_http_lua_run_thread (L=L@entry=0x40ebc378,
    r=r@entry=0x162df90, ctx=ctx@entry=0x1651470, nrets=nrets@entry=0)

When the process runs for more than 2 hours, it may be crash.

zhouming avatar Mar 18 '18 15:03 zhouming

Hello @zhouming, it sound like there might be a callback leak somewhere. Could you post a program that has this problem?

jcupitt avatar Mar 18 '18 17:03 jcupitt

Until now no small test case has been built. Can't reproduce bug in the development environment.

Env:

Centos 7 3.10.0-693.17.1.el7.x86_64 Openresty 1.13.6.1 libvips 8.6.1

Combine images and text.

Use methods:

new_from_buffer text extract_band ifthenelse crop insert

zhouming avatar Mar 19 '18 06:03 zhouming

Thanks for the info!

Thinking again, it looks like there is a GC being triggered here:

https://github.com/jcupitt/lua-vips/blob/master/src/vips/voperation.lua#L127

This is a call from LuaJIT into C, then from C back into LuaJIT. In your crash, the inner LuaJIT is doing a GC during argument conversion, and that is failing.

How about disabling the GC for this callback? Could you try changing your lua-vips from this:

            local cb = ffi.cast(voperation.argumentmap_typeof,
            ....
            )
            vips.vips_argument_map(self, cb, nil, nil )
            cb:free()

to:

            local cb = ffi.cast(voperation.argumentmap_typeof,
            ....
            )
            collectgarbage("stop")
            vips.vips_argument_map(self, cb, nil, nil )
            cb:free()
            collectgarbage("restart")

See:

http://luatut.com/collectgarbage.html

If it fixes your problem, I'll make this change to lua-vips.

jcupitt avatar Mar 19 '18 09:03 jcupitt

It not works.

#0  0x00007fa122f70d69 in lj_alloc_free (msp=0x41750010, ptr=<optimized out>)
    at lj_alloc.c:1404
#1  0x00007fa122f28c6d in gc_sweep (g=g@entry=0x417503b8, p=0x413d8d28,
    lim=24, lim@entry=40) at lj_gc.c:406
#2  0x00007fa122f2970b in gc_onestep (L=L@entry=0x40c6b140) at lj_gc.c:637
#3  0x00007fa122f29d4c in lj_gc_step (L=L@entry=0x40c6b140) at lj_gc.c:689
#4  0x00007fa122f29e2b in lj_gc_step_jit (g=<optimized out>, steps=1)
    at lj_gc.c:722
#5  0x00007fa12399427d in ?? ()
#6  0x0000000000000256 in ?? ()
#7  0x0000000040c6b140 in ?? ()
#8  0x0000000040c6b140 in ?? ()
#9  0x00007ffc99cf4928 in ?? ()
#10 0x0000000041750378 in ?? ()
#11 0x0000000040c6b140 in ?? ()
#12 0x0000000000000001 in ?? ()
#13 0x00007fa122f72321 in lj_cf_collectgarbage (L=0x40260598) at lib_base.c:444
#14 0x00007fa122f26ae3 in lj_BC_FUNCC ()
   from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#15 0x00000000004e9279 in ngx_http_lua_run_thread (L=L@entry=0x41750378,
    r=r@entry=0x1274f90, ctx=ctx@entry=0x1298470, nrets=nrets@entry=0)
    at ../ngx_lua-0.10.11/src/ngx_http_lua_util.c:1010

Now, fork a new child process when the process crashes.

zhouming avatar Mar 21 '18 06:03 zhouming

Oh that's a shame.

That looks like a different stack trace. Do you see crashes at random positions in your code? Sorry, without a small test case, this will be very difficult to debug.

It could also be a LuaJIT bug, of course. Version 2.1 is still in beta, and has had some GC problems. Perhaps you could test with an older openresty?

jcupitt avatar Mar 21 '18 08:03 jcupitt

Thanks, i will try.

zhouming avatar Mar 22 '18 06:03 zhouming

I've not encountered this issue yet (with a lot of traffic, we resize 2 million images per hour). Could there be another problem in your code?

Callbacks are slow in LuaJIT. So, I tried to experiment with adding vips_object_get_args to libvips and calling it from Lua.

This may resolve this issue, and should be slightly faster (I've not benchmarked it yet). If you want, you could test it with https://github.com/kleisauke/libvips/commit/d33b7ec31b796c324c08c46accd42b433086b979 and https://github.com/kleisauke/lua-vips/commit/6d451f0493b788007fafeaf5a53c74547a0430d6 to see if this solves your problem.

kleisauke avatar Aug 09 '18 15:08 kleisauke

With Lua 5.3 I have a similar (or maybe the same) issue. It doesn't occur with Lua 5.4 though, maybe because of some differences in garbage collection. Here's a simple reproducer and a simple fix:

Reproducer: Run in the terminal

for i in {1..50}; do lua5.3 test.lua; done;

with the following script test.lua:

local vips = require "vips"
vips.Image.new_from_file("empty.pdf"):extract_band(0)

where you can use e.g. this PDF: test.pdf

On my machine it crashes about 7-8 times (from 50 times) on average. grafik

The fix is to replace the line

    vips_lib.vips_object_unref_outputs(vop)

in voperation.lua by

    collectgarbage("stop")
    vips_lib.vips_object_unref_outputs(vop)
    collectgarbage("restart")

rolandlo avatar Mar 15 '24 17:03 rolandlo

The issue may be solved via #65, so I'm closing the ticket. Please reopen or open a new ticket if you still experience the issue.

rolandlo avatar Mar 16 '24 07:03 rolandlo