Alpaca icon indicating copy to clipboard operation
Alpaca copied to clipboard

(Question) Model request too large for system

Open hardBSDk opened this issue 6 months ago • 5 comments

Why it happens when I try to run big models?

I think it's probably my small VRAM quantity, I use an integrated AMD GPU with 16GB of RAM.

hardBSDk avatar Jul 01 '25 08:07 hardBSDk

Integrated GPUs (iGPUs) do not possess the necessary compute power to perform LLM inference. They are unfortunately not supported by Ollama. Thus, your system will default to the CPU, where RAM is, per default, much more filled.

Which models did you try to run?

mags0ft avatar Jul 01 '25 09:07 mags0ft

@mags0ft Some model parameter versions below and above 1 billion works, some not (like the 1.7B version of SmolLM2).

hardBSDk avatar Jul 01 '25 11:07 hardBSDk

How much RAM and swap is available at that time? For me, Qwen 3 4B Q4_K_M works comfortably with around 6-8 GB of RAM left. Always keep in mind that LLM context also takes up space.

mags0ft avatar Jul 03 '25 09:07 mags0ft

A good way of telling if a model can run is by converting it size to GB like this and adding 2GB

Qwen3 (4B) -> 4GB + 2GB = 6GB (so you probably want 6 gigs free on ram)

Jeffser avatar Jul 03 '25 19:07 Jeffser

@Jeffser Weird, because I have 16GB and it should have a lot of available memory for programs.

Do you think the RadeonSI, RADV and AMDGPU drivers is not increasing the VRAM even if available memory?

hardBSDk avatar Jul 04 '25 04:07 hardBSDk