DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

Limit VRAM usage in serving the model

Open risedangel opened this issue 10 months ago • 2 comments

is it possible to limit "max_memory" while serving the model ?

risedangel avatar Mar 31 '24 20:03 risedangel

both for standart and openai serving

risedangel avatar Mar 31 '24 20:03 risedangel

The problem is, i have tried to serve the model in two different card. Both on 3090 and rtx 6000 ada generation. Mosel serving ate up all the vram in both scenarios. I want to run an embedding model on the same gpu but it leaves no space.

risedangel avatar Apr 01 '24 18:04 risedangel