seldon-core
seldon-core copied to clipboard
Add support for /v2/models/{model name}/config endpoint in Executor
This is part of Triton extensions https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md
I think there are some things we need to consider before adding
- If this is just Triton it will be confusing to users if MLServer does not support this. @adriangonz
- How should this be interpreted in an inference graph of multiple components or chained V2 models.
- It needs to be valid for REST and gRPC
- How should be interpret this for Seldon or Tensorflow protocols
@cliveseldon Those are good questions which you may decide on internally but I'll share my use-case for this fix. We have some users who currently use Triton and we want to move them to Seldon-Triton instead to get Seldon's features. The issue is that in their client logic, they check the model's config to know what type of inputs to send, which doesn't work over Seldon. They can send the request directly to the Triton service but that will require some refactoring of their client which is an issue right now for us.
Is there any way that this will be fixed for the next version (either with my PR or not) even if not all questions would have a definitive answer by the time of the release?
Hey @naor2013, given that the use case seems mainly to fall on reading the model's inputs, wouldn't it be enough to check the metadata endpoint (i.e. GET /v2/models/{model name})? This endpoint is part of the standard V2 dataplane and (should?) be exposed already both in MLServer and Triton, but also at the Seldon Core's executor level.
Would that be an acceptable (i.e. simple and valid) workaround?
@adriangonz thanks for the response. We have 2 issues with this approach, one is solveable on our end, but the other one, not so much.
The first issue is that our users use Triton Client which calls the /config path internally. We can change that by removing the use of Triton Client and call the path ourselves. For the long term, we rather not do it but for a short term solution, that is possible.
The other issue which is a Seldon issue is when we use multiple models in one seldon route. Seldon has an internal check to see if model name is part of the SeldonDeployment yaml but Triton can have multiple models with one appearance in the Seldon yaml. This will cause an error when a user will try to get a model configuration using the path you suggested if the model name is known by Triton but not by Seldon.
Hello @adriangonz and @cliveseldon It's very important for us that this issue will be fixed soon (as I mentioned, it prevents our users to move to Seldon). It will be great if you can CR my PR and help me finish it to get it into a new version as soon as possible. Another issue that is very important to us (related to this one), is that some routes don't work when one triton server will have multiple models that Seldon is not aware of (described in my message above). Let me know if I should open a new issue for that.
Hey @naor2013,
On the first point you raise, it does feel like the metadata endpoint should be the right one to call for this use case. Main reason is that the metadata endpoint is part of the standard V2 inference protocol (i.e. as opposed to the /config endpoint, which is a Triton-specific extension). Therefore, it will be supported across every V2-compatible inference server (e.g. like MLServer).
Could you provide more details on why wouldn't the /metadata endpoint work with tritonclient? I'm not that familiar with the library, but it feels like is exposed there (under the get_model_metadata() method).
On the other hand, on the second point you raise, looking at the Triton examples in the Seldon Core docs, it does feel like it should be already supported. Have you tried sending a request as it's outlined on that example?
Hi @adriangonz
Thanks for the answer.
First of all, there is a difference between /config and /metadata.
In Triton, /metadata gives you the major parts of your model, so mostly the name, inputs and outputs.
But /config gives you the entire config info, including any special Triton features like model warmup.
For most cases, /metadata is enough, but if you use Triton's tools, for example their model analyzer, it does use the /config route to get more info on the model. It can be found here.
Also, AFAIK the V2 inference protocol is the base protocol that all tools use but it can be extended for specific use-cases, like Triton is extending it. Since Seldon explicitly supports Triton, I would assume Seldon supports the base V2 inference protocol and any relevant extensions for tools they support, like Triton's.
About the second point, the link you sent shows how to use /infer when you have multiple models in Triton. That does indeed work. But, using other routes like /metadata and some others does not work. It doesn't work because Seldon takes the model name from the route and checks if it knows that model name based on the yaml. If it does not, it returns an error. This check does not happen in /infer and shouldn't happen at all for any Triton service since you can assume Seldon isn't aware of all the model names that exist inside the Triton service.
The checks I refer to can be found here.
I hope we can resolve this quickly so our users can start using Seldon as soon as possible. :)
Thinking about this one, an alternative short-term workaround could be to disable the executor on your SeldonDeployment manifest. This would give you direct access to Triton, which could be a good fit for your use case.
Is this something that you could try out @naor2013 ?
@adriangonz That idea came up for us as well but we do try to utilize Seldon's features as much as possible so we'll lose on that front. We already have a "good enough" short term solution which is to use the /metadata, not use Triton's model analyzer for now (which is a big loss for us) but most importantly, we don't use multiple models on the same Triton server. This does work, but we would like to change it as soon as we can, which is why I try to see if it is possible to work on a long term solution quickly.
My PR solves our specific issues since it adds the config route and doesn't have the multi model bug in it but I understand that it might not be good enough yet so It'll be great if, with your guidance, I'll fix the PR to be more suitable to what you think should be a part of Seldon's code.
Let me know :)
Hey @naor2013 ,
We definitely see the value on enabling Triton's perf analyser to work with the executor. However, my main concern would be that exposing a Triton-specific endpoint (which deviates from the V2 protocol) on the top-level API of the executor, could break other V2-protocol inference servers. On top of this, Triton's config format is very specific to Triton, therefore it wouldn't be straightforward to extend other inference servers with their own /config endpoint.
I'm wondering whether, perhaps an alternative solution could be to forward any V2 model paths unknown to the executor, so that the underlying inference server can decide whether that works or not. Sort of, propagating all /v2/models/<name>/***** paths downstream (other than the /infer endpoint, which the executor needs to have control over). That way, the executor would remain decoupled from any particular details of the underlying inference servers.
What would you think of that approach @naor2013 ?
Thanks for the response @adriangonz This implementation should solve our 2 problems for sure (The /config and the Triton multiple-models). Also adding /config exclusively for Triton models and ignoring the model check of Triton backends can solve this. Either works, my concern is like I said before, that it blocks some of our users to move to Seldon so we really hope it would be something that can be quickly solved (I can submit another pull request if you think that would help).
I see that @cliveseldon moved this issue from 1.13 to V2 (which I don't know what is the timeline for it).
Hey @naor2013 ,
Apologies on the delay getting back to you.
We do see the value on supporting the use of Triton's perf tool directly on a SeldonDeployment. Therefore, given that you've submitted a PR already, I think it could make sense to go for that approach as a stopgap solution. I'll try to have a look at the changes today, and hopefully once it's reviewed it should be good to go ahead! :+1:
For the mid term though, I do still think that we should move towards a Triton-agnostic solution. E.g. forwarding unknown endpoints to the underlying inference servers.
Validate in v2