server icon indicating copy to clipboard operation
server copied to clipboard

How is the order determined for loading a model onto a specific device?

Open mhbassel opened this issue 6 months ago • 0 comments

How does Triton Server determine the order of loading models onto specific devices, such as a GPU? For instance, if there isn't enough VRAM available for all models, how does the server decide which models to load on the GPU and which ones to fall back to the CPU?

mhbassel avatar Aug 21 '24 14:08 mhbassel