serve icon indicating copy to clipboard operation
serve copied to clipboard

Enable multiple serving signatures, as TF Serving does

Open piEsposito opened this issue 3 years ago • 2 comments

🚀 The feature

It would be great if, given a handler, we could define multiple serving signatures and bind them to specific endpoints of models. That way, we could use the same model in different ways without having to load them twice in memory.

Motivation, pitch

It is expensive to load very large models (language ones, for example) in memory, and this is needed if we need to call them in different ways from our client. And example of that is the case of having the same language model used for generation and text embeddings, and it is not optimal to load it twice in memory just to use the same backbone in different ways.

Alternatives

  • Loading the model twice in memory, but this is not optimal for very large models
  • Loading parts of the model in memory as different models and calling them in-sequence for different endpoints, but this requires model surgery sometimes, which adds to much complexity to the development and workflow.

Additional context

No response

piEsposito avatar Aug 23 '22 14:08 piEsposito

Thanks @piEsposito It would be great if you can give a specific example you had in mind and how this would look like. If you have an example of this in some other framework, you could share that as well.

agunapal avatar Aug 23 '22 17:08 agunapal

In tf serving you can define, on the model, multiple functions and bind them to endpoints when serving. That way, you can call the same model differently (using different parts of the graph, for example).

So we could have a decorator or a config on on the handler that lets the dev define which methods of it are called when each endpoint of the server is called.

TF Serving does that as: https://www.tensorflow.org/tfx/serving/signature_defs

piEsposito avatar Aug 24 '22 13:08 piEsposito