(Question) Model request too large for system
Why it happens when I try to run big models?
I think it's probably my small VRAM quantity, I use an integrated AMD GPU with 16GB of RAM.
Integrated GPUs (iGPUs) do not possess the necessary compute power to perform LLM inference. They are unfortunately not supported by Ollama. Thus, your system will default to the CPU, where RAM is, per default, much more filled.
Which models did you try to run?
@mags0ft Some model parameter versions below and above 1 billion works, some not (like the 1.7B version of SmolLM2).
How much RAM and swap is available at that time? For me, Qwen 3 4B Q4_K_M works comfortably with around 6-8 GB of RAM left. Always keep in mind that LLM context also takes up space.
A good way of telling if a model can run is by converting it size to GB like this and adding 2GB
Qwen3 (4B) -> 4GB + 2GB = 6GB (so you probably want 6 gigs free on ram)
@Jeffser Weird, because I have 16GB and it should have a lot of available memory for programs.
Do you think the RadeonSI, RADV and AMDGPU drivers is not increasing the VRAM even if available memory?