Sean Sheng

Results 28 issues of Sean Sheng

Long model inference request could timeout the current runner request. Runner timeout is currently hardcoded at 5 minutes. We should make the runner timeout value configurable. SageMaker deployment through bentoctl...

Benchmark API server and runner performance against various model and input sizes.

https://github.com/readthedocs/sphinx_rtd_theme/issues/761

documentation

There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using...

feature
documentation

`OMP_NUM_THREADS` must be set before numpy is imported for it to work, our current implementation doesn’t guarantee that.

feature

### Feature request The default scheduling strategy implementation schedules the same number of runner (`nvidia.com/gpu` supported) instances as the number of available GPUs. If multiple types of runners are present...

feature

### Feature request External modules are current not by default pickled with the model. ### Motivation _No response_ ### Other _No response_

feature

Relevant discussions in https://github.com/bentoml/BentoML/issues/666

help-wanted
enhancement
framework

help-wanted
good-first-issue

help-wanted
good-first-issue