server How is the order determined for loading a model onto a specific device?

How is the order determined for loading a model onto a specific device?

Open mhbassel opened this issue 6 months ago • 0 comments

How does Triton Server determine the order of loading models onto specific devices, such as a GPU? For instance, if there isn't enough VRAM available for all models, how does the server decide which models to load on the GPU and which ones to fall back to the CPU?

Aug 21 '24 14:08 mhbassel

server server copied to clipboard

How is the order determined for loading a model onto a specific device?

server
server copied to clipboard