BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

Results 244 BentoML issues
Sort by recently updated
recently updated
newest added

Support using gRPC instead of HTTP API for sending prediction requests. When an API model server is deployed as a backend service, many teams prefer using gRPC over HTTP. See...

feature

- feat: skafolding onnxmlir support - feat: onnxmlir api work in progress, put here as a draft for trying out #2693 Test will follow accordingly.

Signed-off-by: Aaron Pham added zsh completion. Ideally we want to extend click-completion, but right now click-completion is very slow and wouldn't understand how to autocomplete bento and models

**Is your feature request related to a problem? Please describe.** Currently, Yatai provides a few ways to create and manage deployments: * Web UI (requires yatai account logged in) *...

help-wanted
good-first-issue

### Describe the bug cc https://bentoml.slack.com/archives/CKRANBHPH/p1658494302553029 TLDR: When running `bentoml build` locally, it works as expected. However, on AzureDevOps python agent, the process seems to hang. ### To reproduce Current...

bug
from community

to solve: - [ ] auto DataContainer recognize multiple outputs

feature
feedback-wanted

https://github.com/readthedocs/sphinx_rtd_theme/issues/761

documentation

There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using...

feature
documentation

- [ ] `max-latency` & `timeout` - [x] api server timeout - [x] provide both max-latency and timeout in BentoServer config - [x] default `max-latency`: `10s` - [ ] default...

`OMP_NUM_THREADS` must be set before numpy is imported for it to work, our current implementation doesn’t guarantee that.

feature