serve icon indicating copy to clipboard operation
serve copied to clipboard

Does TorchServe have better performance than calling Pytorch

Open Hegelim opened this issue 3 years ago • 4 comments

From doc here: https://github.com/pytorch/serve

It says TorchServe is a tool used to serve Pytorch models in production.

I am wondering, in theory, if we can expect to have a better performance (in terms of speed and GPU and system memory usage) to do inference on Pytorch models using TorchServe vs not using TorchServe?

I have been searching for a satisfying answer online with detailed comparison between the 2, but weirdly I couldn't find any.

For example, let's say I have a pretrained Pytorch model saved as checkpoint, I can either load this model and do inference directly, or I can serve the model using TorchServe and do inference using the REST api, which is faster? If I should expect TorchServe to do inference faster, what is used under the hood of TorchServe to deliver a faster performance? Maybe distributed computing? I am asking this question because my concern is TorchServe is primarily used for serving the model, it does not change the form of the model fundamentally, thus we cannot expect to have a performance boost.

Any explanation is appreciated.

Hegelim avatar Jun 23 '22 18:06 Hegelim

If you're making an inference on a single model with a single worker then not using any framework will likely be the fastest. The benefits of TS come in when you're managing multiple workers per model or multiple models. But TS is also about integrations with Kubernetes, Docker, and management and metrics API with exports to make your models prod ready. We also try to include reasonable defaults or collaborate with hardware providers so you can get better out-of-the-box performance.

msaroufim avatar Jun 24 '22 16:06 msaroufim

I see. By managing multiple workers, does it have to work given that the model itself has the capacity to support multiple workers?

Hegelim avatar Jul 05 '22 18:07 Hegelim

There's nothing you usually need to make your model work with multiple workers. The only limitation is if your model does some crazy multiprocessing already it won't play too well with TS

msaroufim avatar Jul 05 '22 19:07 msaroufim

Thanks! Just to confirm again, by multiple workers, do you mean multiple GPUs?

Hegelim avatar Jul 05 '22 22:07 Hegelim