Alpaca (Question) Model request too large for system

Why it happens when I try to run big models?

I think it's probably my small VRAM quantity, I use an integrated AMD GPU with 16GB of RAM.

Jul 01 '25 08:07 hardBSDk

Integrated GPUs (iGPUs) do not possess the necessary compute power to perform LLM inference. They are unfortunately not supported by Ollama. Thus, your system will default to the CPU, where RAM is, per default, much more filled.

Which models did you try to run?

Jul 01 '25 09:07 mags0ft

@mags0ft Some model parameter versions below and above 1 billion works, some not (like the 1.7B version of SmolLM2).

Jul 01 '25 11:07 hardBSDk

How much RAM and swap is available at that time? For me, Qwen 3 4B Q4_K_M works comfortably with around 6-8 GB of RAM left. Always keep in mind that LLM context also takes up space.

Jul 03 '25 09:07 mags0ft

A good way of telling if a model can run is by converting it size to GB like this and adding 2GB

Qwen3 (4B) -> 4GB + 2GB = 6GB (so you probably want 6 gigs free on ram)

Jul 03 '25 19:07 Jeffser

@Jeffser Weird, because I have 16GB and it should have a lot of available memory for programs.

Do you think the RadeonSI, RADV and AMDGPU drivers is not increasing the VRAM even if available memory?

Jul 04 '25 04:07 hardBSDk