wgpu-native icon indicating copy to clipboard operation
wgpu-native copied to clipboard

memory corruption (?) in queue_submit

Open mkeeter opened this issue 4 years ago • 2 comments

Running futureproof on macOS 10.13.6, wgpu-native 0.6.0 release, I occasionally see crashes of the form

futureproof(42110,0x7fffaf902380) malloc: *** error for object 0x10135da50: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Process 42110 stopped

This seems to happen most often when I'm repeatedly compiling and deploying new shaders.

Setting a breakpoint in malloc_error_break as requested, I see the following backtrace:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 2.1
  * frame #0: 0x00007fff770869e6 libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007fff77083a8c libsystem_malloc.dylib`szone_error + 392
    frame #2: 0x00007fff770797a5 libsystem_malloc.dylib`tiny_free_list_remove_ptr + 298
    frame #3: 0x00007fff7708eb46 libsystem_malloc.dylib`tiny_free_no_lock + 1450
    frame #4: 0x00007fff7708f2d2 libsystem_malloc.dylib`free_tiny + 628
    frame #5: 0x0000000100c6a2b2 libwgpu_native.dylib`wgpu_core::track::ResourceTracker$LT$S$GT$::remove_abandoned::h6df4d3456d52fb32 + 258
    frame #6: 0x0000000100c5f3b3 libwgpu_native.dylib`wgpu_core::device::life::LifetimeTracker$LT$B$GT$::triage_suspected::haf8866c85eb37146 + 1731
    frame #7: 0x0000000100cae27a libwgpu_native.dylib`wgpu_core::device::Device$LT$B$GT$::maintain::he293882be344d05d + 122
    frame #8: 0x0000000100c23e86 libwgpu_native.dylib`wgpu_core::device::queue::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::queue_submit::h6763fda89d062a10 + 6886
    frame #9: 0x0000000100044c95 futureproof`preview.Preview.redraw(self=0x000000011cb98180) at preview.zig:310
    frame #10: 0x000000010002c25e futureproof`renderer.Renderer.redraw(self=0x0000000115e75010, total_tiles=2688) at renderer.zig:437
... more frames of my application here

This is going to be hard to reproduce, since it's my random Zig application, but I figured I'd open the ticket and see if it rings any bells. The relevant call is here.

mkeeter avatar Dec 07 '20 00:12 mkeeter

Wow, that sounds like a memory stomp somewhere. Could be caused by wgpu-native wrapper, or maybe by some of the Zig logic? We haven't seen this signature in wgpu-rs applications.

Could you perhaps run with address sanitizer of sorts to figure out how the stuff gets stomped?

kvark avatar Dec 07 '20 04:12 kvark

I'll look into it! It seems like Zig doesn't support asan easily, but it's definitely possible that my own code is breaking things 😆

mkeeter avatar Dec 07 '20 16:12 mkeeter