BentoML
BentoML copied to clipboard
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
### Feature request Supporting passing custom dockerfile templates via `bentofile.yaml`: ```yaml docker: dockerfile_template: | {% extends bento_base_template %} {% block SETUP_BENTO_BASE_IMAGE %} {{ super() }} WORKDIR /tmp SHELL [ "bash",...
### Describe the bug `import tensorflow` takes too much memory for api server worker. Please see the `RES` column. Table 1. w/o import tensorflow The memory takes about 85MB per...
### Describe the bug Slack conversation: https://bentoml.slack.com/archives/CKRANBHPH/p1658851031266429 Deployed Yatai to GCP with config `dockerBuilder.priviledged=true`. Encounter error: ``` 2022-07-26T15:53:51.598821930Z => [stage-0 2/11] RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Bin 1.2s 2022-07-26T15:53:51.598827496Z =>...
Long model inference request could timeout the current runner request. Runner timeout is currently hardcoded at 5 minutes. We should make the runner timeout value configurable. SageMaker deployment through bentoctl...
**Describe the bug** When I add the `--production` flag to the `bentoml serve` command, model serving becomes extremely slow compared to without the flag. The `--production` flag seems to make...
## What does this PR address? Adding experimental gRPC implementation for BentoServer. Creating a draft PR to better track progress. ## Before submitting: - [x] Does the Pull Request follow...
Benchmark API server and runner performance against various model and input sizes.
### Feature request `NotImplementedError: Support for "bentoml.paddle" is temporarily unavailable as BentoML transition to the new design in version 1.0.0 release. Before this module is officially implemented in BentoML, users...