modelmesh-serving
modelmesh-serving copied to clipboard
Specify model size in the InferenceService CRD
Would be nice having a new parameter in the InferenceService
CRD that allows user to specify the model size (the size in bytes), avoiding the MODEL_MULTIPLIER
factor to estimate the size.
Is your feature request related to a problem? If so, please describe.
The heuristic used to calculate the model size (model size on disk * MODEL_MULTIPLIER
) is not always accurate because the amount of memory used by a model on a GPU could be greater and sometimes it could be possible to face OOM errors. Due to this problem the number of total models that can stay loaded on the GPU is not estimated correctly.
We already faced this issues using Triton as serving runtime.
Describe your proposed solution
New parameter in the InferenceService
CRD that allows user to specify the model size, avoiding the MODEL_MULTIPLIER
factor to estimate the size.