server
server copied to clipboard
How is the order determined for loading a model onto a specific device?
How does Triton Server determine the order of loading models onto specific devices, such as a GPU? For instance, if there isn't enough VRAM available for all models, how does the server decide which models to load on the GPU and which ones to fall back to the CPU?