DeepSpeed-MII Limit VRAM usage in serving the model

Limit VRAM usage in serving the model

Open risedangel opened this issue 10 months ago • 2 comments

is it possible to limit "max_memory" while serving the model ?

Mar 31 '24 20:03 risedangel

both for standart and openai serving

Mar 31 '24 20:03 risedangel

The problem is, i have tried to serve the model in two different card. Both on 3090 and rtx 6000 ada generation. Mosel serving ate up all the vram in both scenarios. I want to run an embedding model on the same gpu but it leaves no space.

Apr 01 '24 18:04 risedangel

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Limit VRAM usage in serving the model

DeepSpeed-MII
DeepSpeed-MII copied to clipboard