[Mac] Model: deepseek-r1-distill-qwen-7b-generic-gpu can't use GPU
Foundry version: 0.3.9267.42993
When I tried to generate answer in M2 Mac Mini (8GB RAM), Foundry shows very slow generation.
from asitop, GPU is not working on this model.
if I choiced phi-3.5-mini, GPU works properly.
How to reproduce:
foundry model run deepseek-r1-7b
Machine/OS information M2 macmini 8GB RAM
ProductName: macOS
ProductVersion: 15.5
BuildVersion: 24F74
If you look in Activity Monitor what GPU % does InferenceService.Agent have when the model is running? Can also open the GPU History window (under the Window menu) to see overall GPU usage as it runs.
deepseek-r1-7b maxes on the GPU on the mac-mini M4 that I tested on and we don't do anything different internally based on the macOS hardware or version or model if it's a GPU model.
Hi, Thank you for your response.
I checked activity monitor with deepseek 7b and Phi-3.5-mini.
if I choice deepseek 7b, GPU seems low performance.(below is runnning, sorry for japanese setting)
if I choice phi 3.5 mini, GPU shows over 90% working.
in my understand, there is a difference in model size. it’s possible that 8GB memory is insufficient to load the larger model.
That could be it given a machine with 16GB RAM happily uses GPU for the model, and there's no model specific code in the WebGPU execution provider in ONNX Runtime.
This is similar issue to #130 and #134 where we need better handling of the device memory and then pulling the right model. In this case if the GPU memory < model vRAM footprint then we should pull the CPU model.