Sean Sheng
Sean Sheng
Long model inference request could timeout the current runner request. Runner timeout is currently hardcoded at 5 minutes. We should make the runner timeout value configurable. SageMaker deployment through bentoctl...
Benchmark API server and runner performance against various model and input sizes.
https://github.com/readthedocs/sphinx_rtd_theme/issues/761
There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using...
`OMP_NUM_THREADS` must be set before numpy is imported for it to work, our current implementation doesn’t guarantee that.
### Feature request The default scheduling strategy implementation schedules the same number of runner (`nvidia.com/gpu` supported) instances as the number of available GPUs. If multiple types of runners are present...
### Feature request External modules are current not by default pickled with the model. ### Motivation _No response_ ### Other _No response_
Relevant discussions in https://github.com/bentoml/BentoML/issues/666