AI models in microVMs
Feature request description
We either already have some of this functionality:
https://github.com/microsandbox/microsandbox
It should be just:
ramalama run --oci-runtime krun smollm:135m
Needs testing to ensure GPU access works. We should do a follow on blog post to this:
https://developers.redhat.com/articles/2025/02/20/how-ramalama-runs-ai-models-isolation-default
but taking it one step further, encapsulating in microVMs. Tagging @slp for awareness.
Suggest potential solution
No response
Have you considered any alternatives?
No response
Additional context
No response
First attempt, GPU passthrough isn't working (I tried with AMD GPU) and there's some weird issue where ">" displays as "e", could be because of the emoji or TERM or whatever
ramalama run --oci-runtime krun smollm:135m
🦭 e
I'm assuming the goal is to make it work on Linux, isn't it?
Well the goal would be to make this work on a MAC to launch a Linux container with GPU Accelleration.
@slp I intended Linux... I think that would be a good start... We could potentially do macOS as a follow on I guess if we thought it was worth it... Like the above command I tried was Linux with AMD GPU (ROCm)
Sure, I thought you wanted to compete against container.
What is the value of Linux, where I can just use traditional containers? Currently can krun leak random GPUs like nvidia into the VM?
Sure, I thought you wanted to compete against container.
What is the value of Linux, where I can just use traditional containers? Currently can krun leak random GPUs like nvidia into the VM?
I don't see the point in competing with Apple's container right now, I mean podman/podman-machine/krunkit is significantly better IMO.
It's more about competing with solutions like this:
https://github.com/microsandbox/microsandbox
Some people might not be confident container encapsulation is enough, so we can also provide microVM+container isolation.
I mean, that project gaining popularity overnight is proof enough some people value this (and it's using libkrun), it's already a feature of podman...
A friendly reminder that this issue had no activity for 30 days.
@slp thoughts on this? Would this work or do we need to enhance libkrun on Linux to handle GPU?
Not stale
It should work with the latest version of libkrun that just landed on Fedora. I'll confirm it tomorrow.
I can confirm this works on Fedora 42 with libkrun-1.14.0-1.fc42.x86_64 and the latest ramalama container image in quay:
podman run --runtime=krun -ti --rm --device /dev/dri -v ~/models:/models quay.io/ramalama/ramalama
This doesn't work with ramalama run --oci-runtime krun smollm:135m because it pulls quay.io/ramalama/rocm instead of quay.io/ramalama/ramalama. Perhaps having --oci-runtime krun should imply using the default container image?
ROCm would require --device /dev/kfd --device /dev/dri but I understand it may not be desired to look into ROCm at this point in time. Could be too much effort.
ROCm is more widely used, at least for Enterprise AI, for example you could claim accelerated AI microVMs with vLLM if this worked for ROCm.
Does
ramalama run --oci-runtime krun --image quay.io/ramalama/ramalama smollm:135m
Work?
We do have a hard coded case of --runtime vllm pulling a particular image, so this is not unprecedented. It would be nice if we could somehow specify which image to use with krun.
On ROCM, I would like to get to the point where we used ramalama by default on ROCM systems and forced people to override when they want to use the rocm image.
Does
ramalama run --oci-runtime krun --image quay.io/ramalama/ramalama smollm:135mWork?
It does! Tested with smollm and deepseek.
ROCm would require
--device /dev/kfd --device /dev/dribut I understand it may not be desired to look into ROCm at this point in time. Could be too much effort.
We can't transport ROCm over venus. We could study doing something over native context, but unless there is a huge difference in performance I think it's better to standardize on vulkan over venus.
So if we change default to Vulkan, then the case we would need to worry about is cuda?
And vLLM (because vLLM only uses cuda/ROCm).