SkiaSharp [BUG] iOS: Crash when GC collects SKMetalView resources

Description

SkiaSharp v3.116.1 .Net 9 maui 9.0.0/9.0.100 SDK 9.0.100
maui-ios 9.0.0/9.0.100 SDK 9.0.100
IDE: JetBrains Rider

In our app we have several views, one of which contains an SKMetalView. Every time the GC runs to collect resources, the app crashes with the following stack:

at <unknown> <0xffffffff>
	  at SkiaSharp.SkiaApi:gr_direct_context_abandon_context <0x00007>
	  at SkiaSharp.GRContext:AbandonContext <0x0010f>
	  at SkiaSharp.GRContext:DisposeNative <0x00083>
	  at SkiaSharp.SKNativeObject:Dispose <0x001c3>
	  at SkiaSharp.SKObject:Dispose <0x0007b>
	  at SkiaSharp.GRContext:Dispose <0x0007b>
	  at SkiaSharp.SKNativeObject:Finalize <0x000a7>
	  at System.Object:runtime_invoke_virtual_void__this__ <0x00087>

Code

To reproduce the issue SkiaSharpSample.zip:

Click Open Skia Screen, then go back
Click Open Text Screen, then go back
Repeat steps 1 and 2 and the app will crash

Expected Behavior

No response

Actual Behavior

No response

Version of SkiaSharp

3.116.0 (Current)

Last Known Good Version of SkiaSharp

2.88.9 (Previous)

IDE / Editor

Other (Please indicate in the description)

Platform / Operating System

iOS

Platform / Operating System Version

iOS 17.5 iOS 18.2

Devices

No response

Relevant Screenshots

No response

Relevant Log Output

at <unknown> <0xffffffff>
	  at SkiaSharp.SkiaApi:gr_direct_context_abandon_context <0x00007>
	  at SkiaSharp.GRContext:AbandonContext <0x0010f>
	  at SkiaSharp.GRContext:DisposeNative <0x00083>
	  at SkiaSharp.SKNativeObject:Dispose <0x001c3>
	  at SkiaSharp.SKObject:Dispose <0x0007b>
	  at SkiaSharp.GRContext:Dispose <0x0007b>
	  at SkiaSharp.SKNativeObject:Finalize <0x000a7>
	  at System.Object:runtime_invoke_virtual_void__this__ <0x00087>

Code of Conduct

[x] I agree to follow this project's Code of Conduct

Feb 28 '25 19:02 AlisiaNew

Interesting, I've also experienced an occasional crash using SKMetalView too - might be the same as this one. I notice that the backend context is created twice with the default constructor. So, related in a sense, but probably won't fix this issue:

https://github.com/mono/SkiaSharp/pull/3256

Apr 29 '25 04:04 jeremy-visionaid

OK, what I'm seeing is definitely the same problem as described here. It occurs on both iOS and MacCatalyst. As the original stack trace shows, there's a bug disposing/finalizing GRContext with the Metal backend. SKMetalView isn't even required to repro - just creating two GRContexts and disposing them is enough:

        CreateAndDisposeMetalContext();
        CreateAndDisposeMetalContext();

        private static void CreateAndDisposeMetalContext()
        {
            var device = MTLDevice.SystemDefault!;
            using (GRMtlBackendContext backendContext = new()
            {
                Device = device,
                Queue = device.CreateCommandQueue(),
            })
            {
                using GRContext context = GRContext.CreateMetal(backendContext);
            }

            GC.Collect();
            GC.WaitForPendingFinalizers();
        }

I'll have a look into fixing it, or at least adding the above as a unit test.

Apr 29 '25 23:04 jeremy-visionaid

So, the above code was based on what SKMetalView was doing... The actual crash is coming from here

Thread 35 Crashed:: .NET Long Running Task
0   libdispatch.dylib             	       0x18f9c6c7c _dispatch_semaphore_dispose.cold.1 + 40
1   libdispatch.dylib             	       0x18f993e0c _dispatch_semaphore_dispose + 72
2   libdispatch.dylib             	       0x18f9924e4 _dispatch_dispose + 208
3   libdispatch.dylib             	       0x18f99232c dispatch_release + 152
4   Metal                         	       0x19ac8526c -[_MTLCommandQueue dealloc] + 172
5   IOGPU                         	       0x1b145fa94 -[IOGPUMetalCommandQueue dealloc] + 112
6   AGXMetalG14X                  	       0x11a34a230 -[AGXG14XFamilyCommandQueue dealloc] + 84
7   SkiaSharp.Tests.Devices       	       0x102c85988 xamarin_dyn_objc_msgSend + 160 (trampolines-arm64-objc_msgSend.inc.s:99)
8   SkiaSharp.Tests.Devices       	       0x1030278e4 do_icall + 160 (interp.c:2268)
9   SkiaSharp.Tests.Devices       	       0x103025f20 do_icall_wrapper + 356 (interp.c:2350)
10  SkiaSharp.Tests.Devices       	       0x10301b500 mono_interp_exec_method + 2832
11  SkiaSharp.Tests.Devices       	       0x103019114 interp_runtime_invoke + 244 (interp.c:2109)
12  SkiaSharp.Tests.Devices       	       0x102f67420 mono_jit_runtime_invoke + 1320 (mini-runtime.c:3683)
13  SkiaSharp.Tests.Devices       	       0x102f07de8 do_runtime_invoke + 60 (object.c:2576) [inlined]
14  SkiaSharp.Tests.Devices       	       0x102f07de8 mono_runtime_invoke_checked + 148 (object.c:2792)
15  SkiaSharp.Tests.Devices       	       0x102f0f3ec mono_runtime_try_invoke_byrefs + 548 (object.c:5176)
16  SkiaSharp.Tests.Devices       	       0x102ecc070 ves_icall_InternalInvoke + 236 (icall.c:3615)
17  SkiaSharp.Tests.Devices       	       0x102ed5e88 ves_icall_InternalInvoke_raw + 100 (icall-def.h:371)
18  SkiaSharp.Tests.Devices       	       0x10302794c do_icall + 264 (interp.c:2298)
19  SkiaSharp.Tests.Devices       	       0x103025f58 do_icall_wrapper + 412 (interp.c:2354)
20  SkiaSharp.Tests.Devices       	       0x10301b500 mono_interp_exec_method + 2832
21  SkiaSharp.Tests.Devices       	       0x103019114 interp_runtime_invoke + 244 (interp.c:2109)
22  SkiaSharp.Tests.Devices       	       0x102f67420 mono_jit_runtime_invoke + 1320 (mini-runtime.c:3683)
23  SkiaSharp.Tests.Devices       	       0x102f07de8 do_runtime_invoke + 60 (object.c:2576) [inlined]
24  SkiaSharp.Tests.Devices       	       0x102f07de8 mono_runtime_invoke_checked + 148 (object.c:2792)
25  SkiaSharp.Tests.Devices       	       0x102f1c560 start_wrapper_internal + 524 (threads.c:1213) [inlined]
26  SkiaSharp.Tests.Devices       	       0x102f1c560 start_wrapper + 592 (threads.c:1271)
27  libsystem_pthread.dylib       	       0x18fb49c0c _pthread_start + 136
28  libsystem_pthread.dylib       	       0x18fb44b80 thread_start + 8

I can't say I'm familiar with Apple APIs or Objective C (and especially not Apple interop). But it looks/feels like a double free situation (i.e. abandonContext freeing the command queue followed by the finalizer). If I manually call retain on the command queue then it at least doesn't crash, but I'm not sure if this is just masks the problem or causes a leak in its place:

NSObject o = NSObject.FromObject(commandQueue);
o.DangerousRetain();

I've put in a draft pull request with a unit test and a proof of concept for the retain. Hopefully @mattleibow can take a look/take over!

Apr 30 '25 04:04 jeremy-visionaid

Some bits in Skia showing use of retain for the backend context. Looks like it calls through to CFRetain on the command buffer when creating a test context:

https://skia.googlesource.com/skia/+/7755e6ab0bc0/tools/gpu/mtl/MtlTestContext.mm

Apr 30 '25 23:04 jeremy-visionaid

Randomly (1 out of 3) getting this on iPhone when navigating away from a page:

=================================================================
  Managed Stacktrace:
=================================================================
    at <unknown> <0xffffffff>
    at SkiaSharp.SkiaApi:gr_direct_context_abandon_context <0x00042>
    at SkiaSharp.GRContext:AbandonContext <0x0006a>
    at SkiaSharp.GRContext:DisposeNative <0x0001e>
    at SkiaSharp.SKNativeObject:Dispose <0x000cc>
    at SkiaSharp.SKObject:Dispose <0x00020>
    at SkiaSharp.GRContext:Dispose <0x00020>
    at SkiaSharp.SKNativeObject:Finalize <0x0003a>
    at System.Object:runtime_invoke_virtual_void__this__ <0x00090>
    at <unknown> <0x00000>
=================================================================

INFO:   Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x193a27aa4):0x193a27a94  00 00 00 ea 6d ff 
INFO: ff 54 10 7c 5f c8 02 82 7d 92  ....m..T.|_...}.
0x193a27aa4  51 10 40 f9 f1 02 10 36 d0 03 00 36 11 fe 6c d3  [email protected]

So i can easily reproduce.. Dunno how this could help though?

A silly app-level workaround is to stop updating skglview when navigating away from page, so the grcontext is not in use after Dispose(false) is called maybe, something like that.

This is very similar to Windows double-buffered rendering, when you have queued renderings, you "think" you stopped the rendering and you call dispose/anything destroying and a queued backbuffer is still using some resources = crash. Guess here its same, as metal rendering is async too. Just that not many people use skglview on windows so we do not hear much from this: https://github.com/mono/SkiaSharp/issues/1490

Jun 08 '25 14:06 taublast

To reproduce the crash: one can use an SKGLView with either HasRenderLoop set to true or other fast updating means, so it's updating non-stop when you navigate from a page with it inside. This way we can reproduce a collision between Dispose (calls AbandonContext) and a queued rendering that accesses released resources.

Works for both iOS and Windows.

EDIT: Just catched another randomly happening crash on iOS when navigating away from a page with a non-metal SKCanvas (not SKGLView) view, but it had a native metal view inside:

Thread 2 name:  Finalizer
Thread 2 Crashed:
0   libobjc.A.dylib                        0x193a27aa4 objc_release + 16
1   libSkiaSharp                           0x10150841c 0x101208000 + 3146780
2   libSkiaSharp                           0x101508480 0x101208000 + 3146880
3   libSkiaSharp                           0x101405728 0x101208000 + 2086696
4   XXX.Mobile                       0x100810654 do_icall + 124
5   XXX.Mobile                       0x10080ece8 do_icall_wrapper + 348
6   XXX.Mobile                       0x100802260 mono_interp_exec_method + 2580
7   XXX.Mobile                       0x1007ff618 interp_entry_from_trampoline + 656
8   XXX.Mobile                       0x1005b5970 native_to_interp_trampoline + 112
9   XXX.Mobile                       0x1007ac380 mono_gc_run_finalize + 588
10  XXX.Mobile                       0x1006d9144 sgen_gc_invoke_finalizers + 264
11  XXX.Mobile                       0x1007ad594 finalizer_thread + 768
12  XXX.Mobile                       0x10078a9c8 start_wrapper + 348
13  libsystem_pthread.dylib                0x1e55040ec _pthread_start + 115
14  libsystem_pthread.dylib                0x1e550272c thread_start + 7

mono_gc_run_finalize.. ok maybe its mono runtime GC'ing something not recognized as being still in use?

Jun 08 '25 15:06 taublast

Fixed what was happening for me inside a custom handler, sharing for what it's worth. Can't say in general. What was happening for me:

inside public class SKMetalView (SkiaSharp current code in main branch) we create

            _backendContext = new GRMtlBackendContext
            {
                Device = _device,
                Queue = _device.CreateCommandQueue() // this queue will get killed by GC collect, triggered either by runtime, or by MAUI code when disconnecting handler. No idea why, this makes me thing of .net runtime issue.
            };

then we use it inside void IMTKViewDelegate.Draw(MTKView view):

using var commandBuffer = _backendContext.Queue.CommandBuffer();

And it looks like commandBuffer is using native resources from Queue. So if the commandBuffer finalizer is called after the Queue was disposed we get a crash trying to dispose some already disposed resources.

Queue can be disposed for some reason after base is called inside the handler protected override void DisconnectHandler(SKMetalView platformView). Or later depending on the GC pressure, this creates random situations.

If commandBuffer is disposed (notice using) after that (randomless++) we suffer:

 at <unknown> <0xffffffff>
	  at ObjCRuntime.Messaging:void_objc_msgSend <0x00098>
	  at ObjCRuntime.BaseWrapper:Release <0x00096>
	  at CoreFoundation.NativeObject:Dispose <0x00062>
	  at ObjCRuntime.DisposableObject:Finalize <0x00028>
	  at System.Object:runtime_invoke_virtual_void__this__ <0x00090>
	  at <unknown> <0x00000>

So the fix for SKMetalView was to pin the queue inside private void Initialize():

 _queuePin = GCHandle.Alloc(_backendContext.Queue, GCHandleType.Pinned);

and unpin inside Dispose:

 if (_queuePin.IsAllocated)
 {
     _queuePin.Free();
 }

I hope this could help.

Jun 09 '25 16:06 taublast

@taublast I don't think that's quite correct. The crash occurs on dispose - preventing the GC/finalization doesn't help with that, since it can be made to happen deterministically:

https://github.com/mono/SkiaSharp/compare/main...jeremy-visionaid:SkiaSharp:grcontext-metal-finalizer

Jun 09 '25 21:06 jeremy-visionaid

@jeremy-visionaid sure, it's all possible, i'm not an expert. Like i said, fixed the crash for myself and wanted to share.

Jun 10 '25 05:06 taublast

Looking at the apple docs: https://developer.apple.com/library/archive/documentation/Miscellaneous/Conceptual/MetalProgrammingGuide/Cmd-Submiss/Cmd-Submiss.html#//apple_ref/doc/uid/TP40014221-CH3-SW14

Command Queue A command queue accepts an ordered list of command buffers that the GPU will execute. All command buffers sent to a single queue are guaranteed to execute in the order in which the command buffers were enqueued. In general, command queues are thread-safe and allow multiple active command buffers to be encoded simultaneously.

To create a command queue, call either the newCommandQueue method or the newCommandQueueWithMaxCommandBufferCount: method of a MTLDevice object. In general, command queues are expected to be long-lived, so they should not be repeatedly created and destroyed.

Aug 18 '25 23:08 mattleibow

Yeah, the trouble here though appears to be that Skia itself is destroying the command queue, which I believe is the cause of the crash. In the upstream tests, they add a retain: https://skia.googlesource.com/skia/+/7755e6ab0bc0/tools/gpu/mtl/MtlTestContext.mm#48

Maybe we need to ask on the Skia forums about what's expected from the caller there...

Aug 18 '25 23:08 jeremy-visionaid

I am asking here, so we shall see: https://groups.google.com/g/skia-discuss/c/fYp8lLue4zU

Aug 18 '25 23:08 mattleibow

There is an incredible amount of "no retain" words in this block:

https://github.com/google/skia/blob/main/src/gpu/ganesh/mtl/GrMtlGpu.mm#L49-L75

I am wondering if we are supposed to retain?

Aug 18 '25 23:08 mattleibow

Lookin at instruments with an app, I see the ques being created and then they disappear as well - this is with the dangerous retain. Very interesting. I wonder what the latest skia does and if this is a currect skia bug or an always skia bug. Or a feature.

Aug 19 '25 00:08 mattleibow

Oh, is this a problem? Looks like it resets some smart pointers and then it immediately goes out of scope...

sk_sp<GrDirectContext> GrDirectContext::MakeMetal(void* device, void* queue,
                                                  const GrContextOptions& options) {
    sk_sp<GrDirectContext> direct(new GrDirectContext(GrBackendApi::kMetal, options));
    GrMtlBackendContext backendContext = {};
    backendContext.fDevice.reset(device);
    backendContext.fQueue.reset(queue);

    return GrDirectContext::MakeMetal(backendContext, options);
}

It's a deprecated function, and doesn't look like the first GrDirectContext is even used...

Aug 19 '25 00:08 jeremy-visionaid

Ooh, that is what we are using and it looks so weird.

This is the one we are suppsoed to be using and it looks like somehting got lost in the migration:

sk_sp<GrDirectContext> GrDirectContext::MakeMetal(const GrMtlBackendContext& backendContext,
                                                     const GrContextOptions& options) {
    sk_sp<GrDirectContext> direct(new GrDirectContext(GrBackendApi::kMetal, options));

    direct->fGpu = GrMtlTrampoline::MakeGpu(backendContext, options, direct.get());
    if (!direct->init()) {
        return nullptr;
    }

    return direct;
}

EDIT

I see the deprecated calls this one eventually. And options is null/the default object, so a duplicate is probably not a problem.

The "reset" is just the way the pointers work to "set" the value. It is effectively constructing a GrMtlBackendContext object and then passing that to this function.

Aug 19 '25 00:08 mattleibow

Yeah, I can't say whether it matters or not without poking at it, but the deprecated named constructor looks sketchy to me. I haven't compiled the native bits on mac before, but otherwise looks simple enough to at least do a quick hack to try out calling the other one... I imagine you'd beat me to it 😆

Aug 19 '25 01:08 jeremy-visionaid

Other places do things like this:

externals/skia/tools/gpu/mtl/MtlTestContext.mm:

backendContext.fDevice.retain((GrMTLHandle)device.get());
sk_cfp<id<MTLCommandQueue>> queue([*device newCommandQueue]);
backendContext.fQueue.retain((GrMTLHandle)queue.get());

externals/skia/tools/window/MetalWindowContext.mm:

GrMtlBackendContext backendContext = {};
backendContext.fDevice.retain((GrMTLHandle)fDevice.get());
backendContext.fQueue.retain((GrMTLHandle)fQueue.get());

So I think we are supposed to?

Aug 19 '25 01:08 mattleibow

I reviewed your PR and I tihnk for now, since the only place we own the queue is the SKMetalView, we should keep the retain there.

As soon as we have more data we can change.

I don't want to inadvertantly retain and then behaviour happens as a result and we can't revert. Maybe Google will say something exciting and we will be "oh yeah, that was smart! Good call Google" or something.

Aug 19 '25 01:08 mattleibow

We also need to test this on an actual iOS device.

Aug 19 '25 01:08 mattleibow

I'm wondering if the retains there are also to workaround the local scoping issues of GrMtlBackendContext 🤔 As an interrim thing, I don't think retaining in SKMetalView is too bad. It would at least fix the crashes that SkiaSharp users are currently getting, even if GRContext disposal were still broken without others needing to do an explicit retain. But if you've got a few more days then we can wait to see what the Skia guys say and/or I could try another approach.

Aug 19 '25 01:08 jeremy-visionaid

@mattleibow Do you know which version of XCode the native deps are expected to build against so I can build the native deps locally (it doesn't look like XCode 16.4/macOS 15.5 SDK works)? I can only see the container is macos14...

Aug 19 '25 02:08 jeremy-visionaid

Ah, nvm, #3058 is sufficient to fix the local build

Aug 19 '25 21:08 jeremy-visionaid

Now that I have a local native build, here's a quick and dirty hack to show the sketchy constructor in Skia is the cause of the problem:

sk_sp<GrDirectContext> GrDirectContext::MakeMetal(void* device, void* queue,
                                                  const GrContextOptions& options) {
    GrMtlBackendContext *backendContext = new GrMtlBackendContext();
    backendContext->fDevice.reset(device);
    backendContext->fQueue.reset(queue);

    return GrDirectContext::MakeMetal(*backendContext, options);
}

Cherry-picking the unit test commit on top, the test passes. So, I'd say that my proposed fix is near enough equivalent for the queue at least but without the leak (though it would still leave a potential issue for removable metal devices).

An alternative approach might be to tie GrMtlBackendContext to the GrDirectContext lifetime on the managed side. WDYT @mattleibow ?

Aug 19 '25 22:08 jeremy-visionaid

Are you saying removing the dodgy line is all it takes to fix it?

Aug 19 '25 23:08 mattleibow

I’m saying that it might be either the scope of the context or that they called reset instead of retain: https://github.com/mono/SkiaSharp/pull/3258#discussion_r2286464628

Aug 19 '25 23:08 jeremy-visionaid

SkiaSharp SkiaSharp copied to clipboard

[BUG] iOS: Crash when GC collects SKMetalView resources

Description

Code

Expected Behavior

Actual Behavior

Version of SkiaSharp

Last Known Good Version of SkiaSharp

IDE / Editor

Platform / Operating System

Platform / Operating System Version

Devices

Relevant Screenshots

Relevant Log Output

Code of Conduct

SkiaSharp
SkiaSharp copied to clipboard