server
server copied to clipboard
Unable to load/unload models through SageMaker + Triton
Description sagemaker_server.cc exposes load/unload-ing models through an http post request to SageMaker. I'm unable to load or unload models through SageMaker for Triton. I'm currently testing locally, but eventually would want to run the same setup on a SageMaker endpoints.
I went with opening an issue on this repo, knowing that sagemaker_server.cc lives under this repo even though this is directly linked to SageMaker.
Triton Information 22.12-pyt-python-py3
Are you using the Triton container or did you build it yourself?
Using a Triton container, with the necessary predefined environment variables for SageMaker, and exposing port 8080 for SageMaker server. We also set SAGEMAKER_MULTI_MODEL=true
as required for listing models.
To Reproduce
tritonserver --log-verbose=true --allow-sagemaker=true --allow-grpc=false --allow-http=true --allow-metrics=true --model-control-mode=explicit --model-repository /opt/ml/models
Running the SageMaker test for Triton + SageMaker (this test) fails.
Addressing SageMaker directly through port 8080, according to this documentation, loading a model is expecting the following body:
{
"model_name": "my-model",
"url": "/opt/ml/models/<my-hashed-name>/model"
}
POST http://localhost:8080/models/
Content-Type: application/json
{
"model_name": "my-model",
"url": "/opt/ml/models/<my-hashed-name>/model"
}
Returns the following error:
{"error":"failed to register '', repository not found"}
Expected behavior Loading/unloading models through SageMaker as expected
cc @dyastremsky @rmccorm4
@jadhosn
Could you share more around the objective you are trying to acheive. And also the exact failure you are seeing?
Note that in MME mode, SageMaker will handle model loading and unloading on behalf of the customer and customer can only chose to create, invoke or delete the endpoint. Model is loaded on-demand when endpoint is invoked with TargetModel=xyz.tar.gz.
Could you share more around the objective you are trying to acheive. And also the exact failure you are seeing?
@nskool, I'm looking for the ability to unload models from the gpu memory on command on Sagemaker. Trition already supports loading and unloading models through POST requests. For clarity, I'm not referring to deleting the downloaded model from the temporary storage, I wish to explicitly unload a model from GPU memory, through SageMaker
@jadhosn I've got the solution
Run docker like so:
docker run --rm --net=host -v ${PWD}/repos:/repos triton_server_aws_cpu:0.0.1 tritonserver --log-verbose=true --allow-sagemaker=true --http-port=23000 --allow-grpc=false --allow-http=true --allow-http=true --allow-metrics=true --model-control-mode=explicit --model-repository /tmp
Note, --model-repository is required, so I just set it to /tmp, in the docker/sagemaker/serve
script, they're setting it to some kind of dummy path.
Second note, you want to map your local repos directory to the docker container, which is what -v ${PWD}/repos:/repos
is for, pay attention to this when looking at the JSON for the POST request below.
Looks like the way Sagemaker Triton works is a new model repository is used for each model.
So, the way to then load your models would be to e.g
POST http://localhost:8080/models
JSON:
{
"model_name": "gustavosta-magicprompt-stable-diffusion",
"url": "/repos/gustavosta-magicprompt-stable-diffusion"
}
Your local repos folder then looks something like this:
Checkout docker/sagemaker/serve
and src/sagemaker_server.cc
if you need to reverse engineer anything further
FYI, the HTTP server is not required, I have that there because I was doing some experiments.