BentoML
                                
                                 BentoML copied to clipboard
                                
                                    BentoML copied to clipboard
                            
                            
                            
                        API server SLOs
- [ ]  max-latency&timeout- [x] api server timeout
- [x] provide both max-latency and timeout in BentoServer config
- [x]  default max-latency:10s
- [ ]  default timeout = 1.5 * max-latency
- [ ]  assert that max-latency<timeout
- [ ]  provide bentoml serveCLI arg for--max-latency
- [ ] target: 90% < max-latency
 
- [ ]  --max-request-size?- [ ] provide BentoServer config
- [ ]  default to 10MB
 
Should we implement max latency similar to the deadline feature in gRPC or have a 10 max latency PER runner?
Happy to test the implementation down the road & provide feedback. I have a 100% reproducible situation where I run into timeouts even though the code runs fine (I see my result in the terminal)
hey @parano / @bojiang , is the timeout config already implemented? I don't see it in the bentoml serve (1.0.16) yet... is it possible to pass this config somehow differently to the container?
found it, thx:
https://docs.bentoml.org/en/latest/guides/configuration.html
docker run -e BENTOML_CONFIG_OPTIONS='runners.timeout=3600' -it --rm -p 3000:3000 your_service serve --production
Yes, but the timeout config on app doesn't work currently. We will work on improving this. Thank you.