BentoML
BentoML copied to clipboard
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
Support using gRPC instead of HTTP API for sending prediction requests. When an API model server is deployed as a backend service, many teams prefer using gRPC over HTTP. See...
- feat: skafolding onnxmlir support - feat: onnxmlir api work in progress, put here as a draft for trying out #2693 Test will follow accordingly.
Signed-off-by: Aaron Pham added zsh completion. Ideally we want to extend click-completion, but right now click-completion is very slow and wouldn't understand how to autocomplete bento and models
**Is your feature request related to a problem? Please describe.** Currently, Yatai provides a few ways to create and manage deployments: * Web UI (requires yatai account logged in) *...
### Describe the bug cc https://bentoml.slack.com/archives/CKRANBHPH/p1658494302553029 TLDR: When running `bentoml build` locally, it works as expected. However, on AzureDevOps python agent, the process seems to hang. ### To reproduce Current...
to solve: - [ ] auto DataContainer recognize multiple outputs
https://github.com/readthedocs/sphinx_rtd_theme/issues/761
There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using...
- [ ] `max-latency` & `timeout` - [x] api server timeout - [x] provide both max-latency and timeout in BentoServer config - [x] default `max-latency`: `10s` - [ ] default...
`OMP_NUM_THREADS` must be set before numpy is imported for it to work, our current implementation doesn’t guarantee that.