BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

API server SLOs

Open parano opened this issue 3 years ago • 2 comments

  • [ ] max-latency & timeout
    • [x] api server timeout
    • [x] provide both max-latency and timeout in BentoServer config
    • [x] default max-latency: 10s
    • [ ] default timeout = 1.5 * max-latency
    • [ ] assert that max-latency < timeout
    • [ ] provide bentoml serve CLI arg for --max-latency
    • [ ] target: 90% < max-latency
  • [ ] --max-request-size ?
    • [ ] provide BentoServer config
    • [ ] default to 10MB

parano avatar Jan 04 '22 01:01 parano

Should we implement max latency similar to the deadline feature in gRPC or have a 10 max latency PER runner?

ssheng avatar Jan 04 '22 02:01 ssheng

Happy to test the implementation down the road & provide feedback. I have a 100% reproducible situation where I run into timeouts even though the code runs fine (I see my result in the terminal)

chris-aeviator avatar Jan 19 '22 08:01 chris-aeviator

hey @parano / @bojiang , is the timeout config already implemented? I don't see it in the bentoml serve (1.0.16) yet... is it possible to pass this config somehow differently to the container?

found it, thx: https://docs.bentoml.org/en/latest/guides/configuration.html docker run -e BENTOML_CONFIG_OPTIONS='runners.timeout=3600' -it --rm -p 3000:3000 your_service serve --production

nadworny avatar Apr 21 '23 11:04 nadworny

Yes, but the timeout config on app doesn't work currently. We will work on improving this. Thank you.

frostming avatar Apr 28 '23 10:04 frostming