MoltenVK icon indicating copy to clipboard operation
MoltenVK copied to clipboard

Uncaught selector, -[CAMetalDrawable addPresentedHandler:] (first introduced 1.2.6)

Open WalkerWhite opened this issue 1 year ago • 18 comments

I have been working with MoltenVK on top of SDL. I have not seen this error/warning anywhere else online. But I consistently get the message

Uncaught selector, -[CAMetalDrawable addPresentedHandler:] in the debug console when running Vulkan on top of an SDL Vulkan surface. It occurs EVERY frame, hurting performance and making debugging near impossible.

At first I though this was an SDL issue. I have rolled back multiple SDL versions and I always get this problem. I even get this issue in projects/tutorials produced by others online -- from people who report no such errors. But searching the SDL code I can see no point where addPresentedHandler would be used.

However, I noticed that these SDL tutorials link to older versions of MoltenVK. And the source of MoltenVK does call addPresentedHandler. So I tried varying the MoltenVK release. And this does indeed solve the problem.

Apparently this issue was introduced in v1.2.6-rc1 released on Oct 17, 2023. The previous 1.2.5 release on August 17, 2023 does not have this issue.

WalkerWhite avatar Jul 22 '24 19:07 WalkerWhite

This is interesting, as I use MoltenVK with an SDL2 game (RBDoom3-BFG) that uses an SDL2 Vulkan surface but does not experience this issue. MoltenVK will only add a presentedHandler if Metal can respond to this capability, which is then used to record actual presentation times. The presentation times are typically used by the application to profile, optimize, and/or control frame latency which is important for games. The MoltenVK code looks like this:

#if !MVK_OS_SIMULATOR
	if ([mtlDrawable respondsToSelector: @selector(addPresentedHandler:)]) {
		[mtlDrawable addPresentedHandler: ^(id<MTLDrawable> mtlDrwbl) {
			endPresentation(presentInfo, signaler, mtlDrwbl.presentedTime * 1.0e9);
		}];
	} else
#endif
	{
		// If MTLDrawable.presentedTime/addPresentedHandler isn't supported,
		// treat it as if the present happened when requested.
		endPresentation(presentInfo, signaler);
	}

So if your Metal implementation does not support adding a presentedHandler (e.g. OS/Metal version, graphics driver and hardware, etc), it should not be activated by MoltenVK.

The only thing I can think of in your case is presentation mode. This is just a hunch, but are you trying to set the vsync mode from SDL2 or from Vulkan during swapchain creation? I know MoltenVK works properly with SDL2 if you do this via the Vulkan API, but I am not sure how things behave if you are trying to control presentation mode directly from SDL2 (e.g. SDL_GL_SetSwapInterval or SDL_RENDERER_PRESENTVSYNC). I may be off base with your scenario, but I am trying to figure out the differences as they relate to presentation.

SRSaunders avatar Jul 23 '24 05:07 SRSaunders

Thanks for the response.

I left out a very important detail. This only happens on macOS. iOS is completely fine. If it matters, I am running Ventura 13.6.6 (keep meaning to upgrade to Sonoma) and using SDL 2.30.5 (latest), though I have tested against multiple versions of SDL.

To get this down to the root cause I did a very straightforward port of HelloTriangle from the Vulkan Tutorial:

https://vulkan-tutorial.com/code/17_swap_chain_recreation.cpp

Here is my diff:

1,2c1,3
< #define GLFW_INCLUDE_VULKAN
< #include <GLFW/glfw3.h>
---
> #include <SDL.h>
> #include <SDL_vulkan.h>
> #include <vulkan/vulkan.h>
26c27,28
<     VK_KHR_SWAPCHAIN_EXTENSION_NAME
---
>     VK_KHR_SWAPCHAIN_EXTENSION_NAME,
>     "VK_KHR_portability_subset"
77c79
<     GLFWwindow* window;
---
>     SDL_Window* window;
108a111,112
>     bool theWidth  = WIDTH;
>     bool theHeight = HEIGHT;
111c115
<         glfwInit();
---
>         SDL_Init(SDL_INIT_VIDEO);
113,117c117,120
<         glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
< 
<         window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
<         glfwSetWindowUserPointer(window, this);
<         glfwSetFramebufferSizeCallback(window, framebufferResizeCallback);
---
>         SDL_Vulkan_LoadLibrary(nullptr);
>         Uint32 sdlflags = SDL_WINDOW_SHOWN | SDL_WINDOW_VULKAN | SDL_WINDOW_ALLOW_HIGHDPI | SDL_WINDOW_RESIZABLE;
>         window = SDL_CreateWindow("Vulkan", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
>                                   WIDTH, HEIGHT, sdlflags);
120,124d122
<     static void framebufferResizeCallback(GLFWwindow* window, int width, int height) {
<         auto app = reinterpret_cast<HelloTriangleApplication*>(glfwGetWindowUserPointer(window));
<         app->framebufferResized = true;
<     }
< 
142,144c140,158
<         while (!glfwWindowShouldClose(window)) {
<             glfwPollEvents();
<             drawFrame();
---
>         bool running = true;
>         while (running) {
>             SDL_Event windowEvent;
>             while (SDL_PollEvent(&windowEvent)) {
>                 if (windowEvent.type == SDL_QUIT) {
>                     running = false;
>                     break;
>                 } else if (windowEvent.type == SDL_WINDOWEVENT) {
>                     if (windowEvent.window.event == SDL_WINDOWEVENT_SIZE_CHANGED) {
>                         framebufferResized = true;
>                         theWidth = windowEvent.window.data1;
>                         theHeight = windowEvent.window.data2;
>                     }
>                 }
>             }
>             
>             if (running) {
>                 drawFrame();
>             }
187,189c201,202
<         glfwDestroyWindow(window);
< 
<         glfwTerminate();
---
>         SDL_DestroyWindow(window);
>         SDL_Quit();
193,199d205
<         int width = 0, height = 0;
<         glfwGetFramebufferSize(window, &width, &height);
<         while (width == 0 || height == 0) {
<             glfwGetFramebufferSize(window, &width, &height);
<             glfwWaitEvents();
<         }
< 
224a231
>         createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;
226a234,235
>         extensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);
>         extensions.emplace_back(VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME);
269c278
<         if (glfwCreateWindowSurface(instance, window, nullptr, &surface) != VK_SUCCESS) {
---
>         if (SDL_Vulkan_CreateSurface(window, instance, &surface) != SDL_TRUE) {
790c799,800
<             glfwGetFramebufferSize(window, &width, &height);
---
>             width = theWidth;
>             height = theHeight;
891,895c901,904
<         uint32_t glfwExtensionCount = 0;
<         const char** glfwExtensions;
<         glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount);
< 
<         std::vector<const char*> extensions(glfwExtensions, glfwExtensions + glfwExtensionCount);
---
>         uint32_t extensionCount = 0;
>         SDL_Vulkan_GetInstanceExtensions(window, &extensionCount, nullptr);
>         std::vector<const char*> extensions(extensionCount);
>         SDL_Vulkan_GetInstanceExtensions(window, &extensionCount, extensions.data());
930c939,940
<         std::ifstream file(filename, std::ios::ate | std::ios::binary);
---
>         std::string path = std::string(SDL_GetBasePath())+"/"+filename;
>         std::ifstream file(path, std::ios::ate | std::ios::binary);
954c964
< int main() {
---
> int main(int argc, char* argv[]) {

As you can see, I am only using SDL to create the surface and manage the event loop (with the window resizing code). I also use SDL_GetBasePath() because the shaders are in the bundle.

The problem happens every time. If I drop 1.2.6 (or higher) of libMoltenVK.dylib into the Frameworks folder, I get the error. If I use the 1.2.5 version, it goes away.

WalkerWhite avatar Jul 23 '24 15:07 WalkerWhite

I left out a very important detail. This only happens on macOS.

Are you on an Intel Mac, or an Apple Silicon Mac?

cdavis5e avatar Jul 24 '24 03:07 cdavis5e

Are you on an Intel Mac, or an Apple Silicon Mac?

Apple SIlicon (M2 Max)

WalkerWhite avatar Jul 24 '24 14:07 WalkerWhite

I can duplicate the problem using the tutorial code with your patch. The difference between MoltenVK 1.2.5 and current versions is that addPresentedHandler() is always activated now to record actual presentation times vs. the past when it was activated only when setting presentation times as part of the VK_GOOGLE_display_timing extension:

old code:

VkResult MVKPresentableSwapchainImage::presentCAMetalDrawable(id<MTLCommandBuffer> mtlCmdBuff, MVKImagePresentInfo presentInfo) {
...
		if (presentInfo.hasPresentTime) {
			addPresentedHandler(mtlDrwbl, presentInfo, signaler);
		}
...
}

new code:

VkResult MVKPresentableSwapchainImage::presentCAMetalDrawable(id<MTLCommandBuffer> mtlCmdBuff, MVKImagePresentInfo presentInfo) {
...
		addPresentedHandler(mtlDrwbl, presentInfo, signaler);
...
}

Since I was involved in this change (as a bug submitter), I feel some responsibility here. I am wondering whether a conditional test should have been retained, but instead using a new boolean called presentInfo.hasPresentTimesInfo which would indicate if VK_GOOGLE_display_timing is active, whether just for collection of present times or actual setting of those times. The code would look like this:

/** Presentation info. */
typedef struct  {
	MVKPresentableSwapchainImage* presentableImage;
	MVKQueue* queue;				// The queue on which the vkQueuePresentKHR() command was executed.
	MVKFence* fence;				// VK_EXT_swapchain_maintenance1 fence signaled when resources can be destroyed
	uint64_t desiredPresentTime;  	// VK_GOOGLE_display_timing desired presentation time in nanoseconds
	uint32_t presentID;           	// VK_GOOGLE_display_timing presentID
	VkPresentModeKHR presentMode;	// VK_EXT_swapchain_maintenance1 present mode specialization
	bool hasPresentTime;      		// Keep track of whether presentation included VK_GOOGLE_display_timing
--> new
	bool hasPresentTimesInfo;		// Keep track of whether VK_GOOGLE_display_timing is active
} MVKImagePresentInfo;


MVKQueuePresentSurfaceSubmission::MVKQueuePresentSurfaceSubmission(MVKQueue* queue, const VkPresentInfoKHR* pPresentInfo)
	: MVKQueueSubmission(queue, pPresentInfo->waitSemaphoreCount, pPresentInfo->pWaitSemaphores, nullptr) {

	const VkPresentTimesInfoGOOGLE* pPresentTimesInfo = nullptr;
...
			case VK_STRUCTURE_TYPE_PRESENT_TIMES_INFO_GOOGLE:
				pPresentTimesInfo = (const VkPresentTimesInfoGOOGLE*) next;
				break;
...
new-->		if (pPresentTimesInfo) {
new-->			presentInfo.hasPresentTimesInfo = true;
			if (pPresentTimes) {
				presentInfo.hasPresentTime = true;
				presentInfo.presentID = pPresentTimes[scIdx].presentID;
				presentInfo.desiredPresentTime = pPresentTimes[scIdx].desiredPresentTime;
			}
		}
...
}

VkResult MVKPresentableSwapchainImage::presentCAMetalDrawable(id<MTLCommandBuffer> mtlCmdBuff, MVKImagePresentInfo presentInfo) {
...
new-->		if (presentInfo.hasPresentTimesInfo) {
			addPresentedHandler(mtlDrwbl, presentInfo, signaler);
		}
...
}

This change would be pretty easy to make, and I have verified it as working for this test app.

However, the issue is why is it happening here and not in the RBDoom3-BFG game or other SDL apps. I have a suspicion and it relates to threading. In the test app the error stems from SDL_PollEvent() which is called just after vkQueuePresentKHR() but likely before the presentation handler is triggered within MoltenVK on presentation complete for the frame. I am not an expert in the internals of SDL, but I suspect Xcode is reporting any yet-to-be-handled OS-level window events (incl. presentation) during SDL polling and reporting it in the console. I suspect this is because draw/present are on the same thread as polling. In my application, the UI thread with SDL polling is separate from the draw / presentation thread.

Note that if you run the triangle demo from the command line you will not see these OS-level diagnostic messages. Also, if you don't want to see them in Xcode, you can temporarily add the environment variable OS_ACTIVITY_MODE = disable to your scheme.

SRSaunders avatar Jul 24 '24 20:07 SRSaunders

I suspect this is because draw/present are on the same thread as polling. In my application, the UI thread with SDL polling is separate from the draw / presentation thread.

This makes sense to me. I am relatively new to Vulkan and am still in a "must draw on the main thread" mindset (though I have a lot of experience with audio rendering and using threads for long running tasks, so I am not thread shy).

Note that if you run the triangle demo from the command line you will not see these OS-level diagnostic messages. Also, if you don't want to see them in Xcode, you can temporarily add the environment variable OS_ACTIVITY_MODE = disable to your scheme.

I would prefer to avoid either of these options. My reason for using SDL over GLFW is so I can make bundled, double-clickable apps (educational developer) so neither command line nor environmental variables is desirable. And Apple frowns on distributing apps with diagnostics in them.

So the question is which is better: force me to separate event/drawing thread, or to make the patch you recommend? I suspect for beginners (e.g. my students) the latter might be nice as threads adds an extra cognitive load when just starting out.

WalkerWhite avatar Jul 24 '24 20:07 WalkerWhite

My reason for using SDL over GLFW is so I can make bundled, double-clickable apps (educational developer)

App bundles will not throw these diagnostic messages to the console (but perhaps to the log - not sure).

So the question is which is better: force me to separate event/drawing thread, or to make the patch you recommend?

I would be happy to submit this change as a PR. However, @billhollings and @cdavis5e would have to approve before accepting the change into the project.

SRSaunders avatar Jul 24 '24 22:07 SRSaunders

@SRSaunders

Thanks for all your detective work on this, and for your proposed solution!

I'm concerned that guarding addPresentedHandler: with presentInfo.hasPresentTimesInfo doesn't fix the original Uncaught selector, -[CAMetalDrawable addPresentedHandler:] error, and that it will just defer the problem to an app that uses VK_GOOGLE_display_timing, at which point, it will reappear, and then we'll be back here again.

  • I have never encountered an Uncaught selector, -[CAMetalDrawable addPresentedHandler:] error. Do we know why it is being triggered?
  • What does SDL_PollEvent() do that would trigger a problem on the main thread, but not another?
  • Why does threading have any kind of impact on that error?

billhollings avatar Jul 25 '24 02:07 billhollings

  • I have never encountered an Uncaught selector, -[CAMetalDrawable addPresentedHandler:] error. Do we know why it is being triggered?

I don't know, but if I had to guess, I'd say that we're sending the addPresentedHandler: message to the wrong object (i.e. an object that isn't actually a CAMetalDrawable). But how did the wrong object appear there in the first place?

What object is being passed to vkCreateMetalSurfaceEXT()? It needs to be a CAMetalLayer for this to work.

The only other possibility I can think of is some kind of latent memory corruption bug--where it is, or if it's even in MoltenVK, I don't know.

cdavis5e avatar Jul 25 '24 04:07 cdavis5e

What object is being passed to vkCreateMetalSurfaceEXT()? It needs to be a CAMetalLayer for this to work.

This is the relevant code in SDL_cocoavulkan.m. It is indeed a CAMetalLayer. There is a failover if vkCreateMetalSurfaceEXT is not found by the loader, using vkCreateMacOSSurfaceMVK. But I have a debug build of SDL and I can confirm that vkCreateMetalSurfaceEXT is indeed being used on the tutorial code that exhibits the problem.

    if (vkCreateMetalSurfaceEXT) {
        VkMetalSurfaceCreateInfoEXT createInfo = {};
        createInfo.sType = VK_STRUCTURE_TYPE_METAL_SURFACE_CREATE_INFO_EXT;
        createInfo.pNext = NULL;
        createInfo.flags = 0;
        createInfo.pLayer = (__bridge const CAMetalLayer *)
                            Cocoa_Metal_GetLayer(_this, metalview);
        result = vkCreateMetalSurfaceEXT(instance, &createInfo, NULL, surface);
        if (result != VK_SUCCESS) {
            Cocoa_Metal_DestroyView(_this, metalview);
            SDL_SetError("vkCreateMetalSurfaceEXT failed: %s",
                         SDL_Vulkan_GetResultString(result));
            return SDL_FALSE;
        }
    }

WalkerWhite avatar Jul 25 '24 13:07 WalkerWhite

I have investigated a few more things since my original response:

  1. The problem also exists with GLFW and not just when using SDL, so that aspect may be a red herring.
  2. The problem also exists after adding threads to the tutorial example code. So @billhollings is correct and that is not the correct path to pursue. While it was only a suspicion on my part, I'm sorry for starting down that path.
  3. The problem does not exist with the triangle example in Sascha Willems' Vulkan samples repo. Nor does it exist with the game app I work on - RBDoom3-BFG, even when VK_GOOGLE_display_timing is active.
  4. When running the tutorial example with my threading modifications, the message seems to appear as a result of calling vkAcquireNextImageKHR().

I can only surmise there is something different about swapchain image resource handling and/or synchronization between this tutorial and the other working scenarios. I don't understand the differences yet, but I will keep looking.

SRSaunders avatar Jul 25 '24 19:07 SRSaunders

Based upon @SRSaunders observation about Sascha Willem's code, I decided to do my own digging. I am not sure I have any answers, but I have some interesting data.

To get this data, I cloned MoltenVK and embedded the XCode project into both the tutorial code and the Sacha Willem's demo. I linked MoltenVK statically, removing the loader and the validation layer. That way I could set some breakpoints and discover what is going on.

The problem is indeed this code from MVKImage.m:

#if !MVK_OS_SIMULATOR
	if ([mtlDrawable respondsToSelector: @selector(addPresentedHandler:)]) {
		[mtlDrawable addPresentedHandler: ^(id<MTLDrawable> mtlDrwbl) {
			endPresentation(presentInfo, signaler, mtlDrwbl.presentedTime * 1.0e9);
		}];
	} else
#endif

The mtlDrawable is passing the guard, but the error is raised on the call to addPresenterHandler.

What is more interesting is what you get when you compare the variable watches in Sascha Willem's code to the tutorial. In Sascha Willem's code, this is what you see:

Screenshot 2024-07-25 at 6 24 00 PM

This object here is a straight-forward implementation of the CAMetalDrawable protocol. And that is because the Sascha Willem's code creates a CAMetalLayer directly using the basic allocator

[CAMetalLayer layer]

On the other hand, in the tutorial, you get this:

Screenshot 2024-07-25 at 6 29 18 PM

Notice this protocol is not a CAMetalDrawable, but is instead something called a CaptureMTLDrawable. I can find no documentation of this protocol anywhere, but I suspect it is some internal Apple class. As you can see it "extends" the CAMetalDrawable protocol using a delegate pattern (the _baseObject attribute). I suspect this delegation is what is failing, as it is failing to pass along the selector. Which suggests an Apple problem.

Why does SDL have this type of drawable? Because SDL does not create a CAMetalLayer directly. Instead, it creates an NSView and uses wantsUpdateLayer and layerClass to request a CAMetalLayer from the OS. The reason for this is window resizing. This way the OS automatically changes the layer when the window is resized. I suspect GLFW does the same thing.

The only remaining question is why @SRSaunders is not having a problem with RBDoom3-BFG. Because even if the window resizing flag is disabled, SDL still creates the window using this resizable pattern.

WalkerWhite avatar Jul 25 '24 23:07 WalkerWhite

I will add that if this turns out to be an Apple problem, it would be nice to add a guard like the one @SRSaunders proposed as we should not be using broken Apple features by default.

WalkerWhite avatar Jul 26 '24 00:07 WalkerWhite

Why does SDL have this type of drawable? Because SDL does not create a CAMetalLayer directly. Instead, it creates an NSView and uses wantsUpdateLayer and layerClass to request a CAMetalLayer from the OS. The reason for this is window resizing. This way the OS automatically changes the layer when the window is resized. I suspect GLFW does the same thing.

The name CaptureMTLDrawable suggests it is related to Metal frame capture. Are you running with frame capture enabled? Does SDL and/or GLFW enable it?

cdavis5e avatar Jul 26 '24 00:07 cdavis5e

The name CaptureMTLDrawable suggests it is related to Metal frame capture. Are you running with frame capture enabled? Does SDL and/or GLFW enable it?

Aha! That was indeed it.

New XCode projects enable automatic capture in Debug mode by default (I always forget about this because capture does nothing in OpenGL). This error appears if either Automatic or Metal are enabled. When capture is turned off in the Scheme completely, the error goes away.

WalkerWhite avatar Jul 26 '24 01:07 WalkerWhite

When capture is turned off in the Scheme completely, the error goes away.

Thanks @cdavis5e and @WalkerWhite for figuring this out.

SRSaunders avatar Jul 26 '24 02:07 SRSaunders

When capture is turned off in the Scheme completely, the error goes away.

Thanks @cdavis5e and @WalkerWhite for figuring this out.

Thanks for all the investigative work by everyone!

@WalkerWhite

Excellent detective work! Thanks!

CaptureMTLDrawable should be plug-compatible with CAMetalDrawable, as it's intended to be substituted by the tools (Xcode & runtime frame capture), to support frame captures, but otherwise it should support everything CAMetalDrawable does. If it does not, this may be a bug in the Apple tool class implementations.

What version of macOS and Xcode are you using?

billhollings avatar Jul 26 '24 02:07 billhollings

What version of macOS and Xcode are you using?

Ventura 13.6.6 on a M2 Max. I have the last Xcode update for Ventura, which is version 15.2 (15C500b).

I am planning on upgrading to Sonoma in the next week (I always upgrade in the summer, once the bugs are worked out, and before the semester starts). If necessary I could report if this persists.

WalkerWhite avatar Jul 26 '24 14:07 WalkerWhite

Follow-up. I finally installed Sonoma and can now run XCode 15.4. The problem is gone. Apparently this is an XCode bug and it only affects Ventura users.

WalkerWhite avatar Aug 17 '24 14:08 WalkerWhite

Follow-up. I finally installed Sonoma and can now run XCode 15.4. The problem is gone. Apparently this is an XCode bug and it only affects Ventura users.

Glad to hear it was fixed for you. Thanks for reporting back.

billhollings avatar Aug 19 '24 13:08 billhollings