mlx icon indicating copy to clipboard operation
mlx copied to clipboard

Wishlist: Vulkan backend for Linux

Open tidux opened this issue 11 months ago • 4 comments

The Asahi Linux project has shipped Vulkan drivers for the M chips. Adding support for this stack would allow better cluster management (e.g. exo) for multi machine mlx setups. It would also, in theory, allow running AI models on other unified memory hardware with Linux support, such as AMD APUs. This would actually fulfill a major design goal of Vulkan (and its AMD ancestor Mantle) by providing a common library for unified compute, without the overhead of OpenCL.

tidux avatar Jan 04 '25 23:01 tidux

This is a pretty major undertaking and it's unlikely we will have bandwidth to work on it in the near future.

We'd need a Vulkan runtime back-end and presumably we'd need to rewrite all of the compute kernels in OpenGL SL (?).

In theory it's all doable, particularly given that the Vulkan API is similar to Metal it should be pluggable as a back-end in MLX. It would be pretty interesting to see what this looks like if someone is interested to work on it.

Adding support for this stack would allow better cluster management (e.g. exo) for multi machine mlx setups

Curious, why so?

awni avatar Jan 06 '25 15:01 awni

Curious, why so?

Linux lets you pass GPU devices in to a container, even in a Kubernetes cluster. This means rather than running something like ansible arm_mac_cluster -a "exo 1>/dev/null 2>&1 &!" to start the mlx-using processes, you can apply a Kubernetes Deployment manifest and run it like a real cluster program. If you aren't using exo and need to run your own MLX jobs via a batch framework there are Kubernetes options like Argo, or non-containerized HPC focused batch frameworks. Linux also won't throw a screaming fit if you try to run new binaries without manually approving them through the GUI on each host.

tidux avatar Jan 06 '25 17:01 tidux

without the overhead of OpenCL.

What does this mean? Asahi Linux also ships conformant OpenCL 3.0 drivers. I would expect similar performance between OpenCL and Vulkan compute, both use the same backend compiler.

We'd need a Vulkan runtime back-end and presumably we'd need to rewrite all of the compute kernels in OpenGL SL (?).

Not necessarily, anything that can compile to appropriate SPIR-V... that includes HLSL, GLSL, etc.

For OpenCL, it'd likely be OpenCL C but could also be appropriate SPIR-V target.

alyssarosenzweig avatar Jan 21 '25 14:01 alyssarosenzweig

Could we use SPIRV-Cross to convert Metal Shading Language (MSL) kernels directly into SPIR-V for Vulkan? This way, we wouldn’t have to rewrite all the compute kernels for a new backend. SPIRV-Cross handles most of the work of translating MSL, and then we can feed the resulting SPIR-V into Vulkan.

It might still be complex, but using SPIRV-Cross could reduce duplication of effort and ease maintenance compared to a full rewrite.

Thoughts? @awni @tidux @alyssarosenzweig

If you think this is a viable solution I'd like to take this task.

NripeshN avatar Feb 06 '25 16:02 NripeshN