Foundry-Local [Mac] Model: deepseek-r1-distill-qwen-7b-generic-gpu can't use GPU

Foundry version: 0.3.9267.42993

When I tried to generate answer in M2 Mac Mini (8GB RAM), Foundry shows very slow generation.

from asitop, GPU is not working on this model.

if I choiced phi-3.5-mini, GPU works properly.

How to reproduce:

foundry model run deepseek-r1-7b

Machine/OS information M2 macmini 8GB RAM

ProductName:            macOS
ProductVersion:         15.5
BuildVersion:           24F74

May 20 '25 17:05 invent00

If you look in Activity Monitor what GPU % does InferenceService.Agent have when the model is running? Can also open the GPU History window (under the Window menu) to see overall GPU usage as it runs.

deepseek-r1-7b maxes on the GPU on the mac-mini M4 that I tested on and we don't do anything different internally based on the macOS hardware or version or model if it's a GPU model.

May 22 '25 08:05 skottmckay

Hi, Thank you for your response.

I checked activity monitor with deepseek 7b and Phi-3.5-mini. if I choice deepseek 7b, GPU seems low performance.(below is runnning, sorry for japanese setting)

if I choice phi 3.5 mini, GPU shows over 90% working.

in my understand, there is a difference in model size. it’s possible that 8GB memory is insufficient to load the larger model.

May 24 '25 01:05 invent00

That could be it given a machine with 16GB RAM happily uses GPU for the model, and there's no model specific code in the WebGPU execution provider in ONNX Runtime.

May 28 '25 03:05 skottmckay

This is similar issue to #130 and #134 where we need better handling of the device memory and then pulling the right model. In this case if the GPU memory < model vRAM footprint then we should pull the CPU model.

Jun 06 '25 09:06 samuel100