BentoML
BentoML copied to clipboard
API server SLOs
- [ ]
max-latency
&timeout
- [x] api server timeout
- [x] provide both max-latency and timeout in BentoServer config
- [x] default
max-latency
:10s
- [ ] default
timeout = 1.5 * max-latency
- [ ] assert that
max-latency
<timeout
- [ ] provide
bentoml serve
CLI arg for--max-latency
- [ ] target: 90% < max-latency
- [ ]
--max-request-size
?- [ ] provide BentoServer config
- [ ] default to
10MB
Should we implement max latency similar to the deadline
feature in gRPC or have a 10 max latency PER runner?
Happy to test the implementation down the road & provide feedback. I have a 100% reproducible situation where I run into timeouts even though the code runs fine (I see my result in the terminal)
hey @parano / @bojiang , is the timeout config already implemented? I don't see it in the bentoml serve
(1.0.16) yet... is it possible to pass this config somehow differently to the container?
found it, thx:
https://docs.bentoml.org/en/latest/guides/configuration.html
docker run -e BENTOML_CONFIG_OPTIONS='runners.timeout=3600' -it --rm -p 3000:3000 your_service serve --production
Yes, but the timeout
config on app
doesn't work currently. We will work on improving this. Thank you.