serve
serve copied to clipboard
Serve, optimize and scale PyTorch models in production
### 📚 The doc issue In the default image processing handler you process images one by one https://github.com/pytorch/serve/blob/a4d5090e114cdbeddf5077a817a8cd02d129159e/ts/torch_handler/vision_handler.py#L38 it works synchronously. **What is the best way to optimize it, should...
## Description Supports array of parameters. It might be useful in cases where the model might expect a list of images, instead of a single image. It could also be...
### 🚀 The feature Wrote this after an offline discussion with @lxning We've recently standardized on using `benchmarks/benchmark-ab.py` as **the** preferred way to benchmark torchserve models and used it to...
can I run two yolo models in parallel on more then one gpu what would be the best way to optimize the ansamble of two models? thank you
From doc here: https://github.com/pytorch/serve It says TorchServe is a tool used to serve Pytorch models in production. **I am wondering, in theory, if we can expect to have a better...
I am using TorchServe to potentially serve a model from MMOCR (https://github.com/open-mmlab/mmocr), and I have several questions: 1. I tried to do inference on hundreds of images together using batch...
## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/serve/blob/master/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...
Hey everyone! As I reviewing the code as part of PR I'm working on, I found lines in the `pytest` tests that using the `open()` function without a `with`, and...
This PR 1. updates IPEX integration into TorchServe following #1631 . As described in #1631, **Model optimization now** ```python # in base_handler.py if ipex_enabled: self.model = self.model.to(memory_format=torch.channels_last) self.model = ipex.optimize(self.model)...
We have been observing that TorchServe preprocessing time for image classification is a bottleneck - preprocessing time takes a very long time (longer than the actual inference time itself). You...