ramalama icon indicating copy to clipboard operation
ramalama copied to clipboard

AI models in microVMs

Open ericcurtin opened this issue 6 months ago • 17 comments

Feature request description

We either already have some of this functionality:

https://github.com/microsandbox/microsandbox

It should be just:

ramalama run --oci-runtime krun smollm:135m

Needs testing to ensure GPU access works. We should do a follow on blog post to this:

https://developers.redhat.com/articles/2025/02/20/how-ramalama-runs-ai-models-isolation-default

but taking it one step further, encapsulating in microVMs. Tagging @slp for awareness.

Suggest potential solution

No response

Have you considered any alternatives?

No response

Additional context

No response

ericcurtin avatar Jun 16 '25 09:06 ericcurtin

First attempt, GPU passthrough isn't working (I tried with AMD GPU) and there's some weird issue where ">" displays as "e", could be because of the emoji or TERM or whatever

ramalama run --oci-runtime krun smollm:135m
🦭 e

ericcurtin avatar Jun 16 '25 10:06 ericcurtin

I'm assuming the goal is to make it work on Linux, isn't it?

slp avatar Jun 16 '25 12:06 slp

Well the goal would be to make this work on a MAC to launch a Linux container with GPU Accelleration.

rhatdan avatar Jun 16 '25 13:06 rhatdan

@slp I intended Linux... I think that would be a good start... We could potentially do macOS as a follow on I guess if we thought it was worth it... Like the above command I tried was Linux with AMD GPU (ROCm)

ericcurtin avatar Jun 16 '25 16:06 ericcurtin

Sure, I thought you wanted to compete against container.

What is the value of Linux, where I can just use traditional containers? Currently can krun leak random GPUs like nvidia into the VM?

rhatdan avatar Jun 16 '25 19:06 rhatdan

Sure, I thought you wanted to compete against container.

What is the value of Linux, where I can just use traditional containers? Currently can krun leak random GPUs like nvidia into the VM?

I don't see the point in competing with Apple's container right now, I mean podman/podman-machine/krunkit is significantly better IMO.

It's more about competing with solutions like this:

https://github.com/microsandbox/microsandbox

Some people might not be confident container encapsulation is enough, so we can also provide microVM+container isolation.

I mean, that project gaining popularity overnight is proof enough some people value this (and it's using libkrun), it's already a feature of podman...

ericcurtin avatar Jun 16 '25 22:06 ericcurtin

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jul 24 '25 00:07 github-actions[bot]

@slp thoughts on this? Would this work or do we need to enhance libkrun on Linux to handle GPU?

rhatdan avatar Jul 24 '25 10:07 rhatdan

Not stale

rhatdan avatar Aug 26 '25 12:08 rhatdan

It should work with the latest version of libkrun that just landed on Fedora. I'll confirm it tomorrow.

slp avatar Aug 26 '25 14:08 slp

I can confirm this works on Fedora 42 with libkrun-1.14.0-1.fc42.x86_64 and the latest ramalama container image in quay:

podman run --runtime=krun -ti --rm --device /dev/dri -v ~/models:/models quay.io/ramalama/ramalama

This doesn't work with ramalama run --oci-runtime krun smollm:135m because it pulls quay.io/ramalama/rocm instead of quay.io/ramalama/ramalama. Perhaps having --oci-runtime krun should imply using the default container image?

slp avatar Aug 29 '25 08:08 slp

ROCm would require --device /dev/kfd --device /dev/dri but I understand it may not be desired to look into ROCm at this point in time. Could be too much effort.

ROCm is more widely used, at least for Enterprise AI, for example you could claim accelerated AI microVMs with vLLM if this worked for ROCm.

ericcurtin avatar Aug 29 '25 11:08 ericcurtin

Does ramalama run --oci-runtime krun --image quay.io/ramalama/ramalama smollm:135m

Work?

We do have a hard coded case of --runtime vllm pulling a particular image, so this is not unprecedented. It would be nice if we could somehow specify which image to use with krun.

On ROCM, I would like to get to the point where we used ramalama by default on ROCM systems and forced people to override when they want to use the rocm image.

rhatdan avatar Aug 29 '25 11:08 rhatdan

Does ramalama run --oci-runtime krun --image quay.io/ramalama/ramalama smollm:135m

Work?

It does! Tested with smollm and deepseek.

slp avatar Aug 29 '25 14:08 slp

ROCm would require --device /dev/kfd --device /dev/dri but I understand it may not be desired to look into ROCm at this point in time. Could be too much effort.

We can't transport ROCm over venus. We could study doing something over native context, but unless there is a huge difference in performance I think it's better to standardize on vulkan over venus.

slp avatar Aug 29 '25 14:08 slp

So if we change default to Vulkan, then the case we would need to worry about is cuda?

rhatdan avatar Sep 02 '25 14:09 rhatdan

And vLLM (because vLLM only uses cuda/ROCm).

ericcurtin avatar Sep 02 '25 16:09 ericcurtin