serve
serve copied to clipboard
Serve, optimize and scale PyTorch models in production
## Description This PR supports a manually triggerable Github Action that will make an official release. The way this works is assuming a code freeze, we will run a bunch...
### 🚀 The feature Need feature for sharing the GPU across models. It can be configured by setting the 0 < workers < 1 for a model. ### Motivation, pitch...
Fixes #1492 TorchServe defines metrics in a metrics.yaml file, including both frontend metrics (i.e. ts_metrics) and backend metrics (i.e. model_metrics). When TorchServe is started, the metrics definition is loaded in...
### 🐛 Describe the bug Hello, I am trying to test the error cases using the postman toolkit. In the cases, I send the same wrong params continuously, however, only...
### 🚀 The feature **Current Setup:** Currently, the TorchServe backend worker process dies whenever it receives an invalid json formatted requests. **Feature Requested:** Instead of killing the backend worker process,...
We have 2 onnx models deployed in a GPU machine built on top of the nightly docker image. - The first model runs with 0 failure at 500 QPS (p99...
## Description This PR adds an example showing how to create and deploy a single GPU DLRM example with TorchRec. Because the current TorchRec version 0.2.0 needs Pytorch 1.12.0 this...
After looking into #1744 I noticed we don't actually use our `docs/sphinx/requirements.txt` in CI, this is not great because if there's issues with upstream dependencies like `markdown` it means we...
### 🚀 The feature Register workflows as part of application startup for immediate access to workflow predictions. ### Motivation, pitch Currently, workflows must be registered via the management api at...
Authors : [Hamid Shojanazeri](https://github.com/HamidShojanazeri), [Shen Li](https://github.com/mrshenli) ## **Problem statement** Currently, Torchserve does not have a general solution for serving large models for inference.The only available support is in HuggingFace(HF) [...