BentoML feat: implement batching strategies

This adds a new configuration value, runner.batching.target_latency_ms, which controls how long the dispatcher will wait before beginning to execute requests.

Could probably do with a little bit of testing to see how setting it to 0 performs vs leaving as ~, but for now adding more knobs users can tweak is probably a good thing; I suspect there will be at least a few people who want the behavior of infinite max latency but not long wait times for requests after a burst.

EDIT: This PR has now been updated to provide a strategy option in the configuration, which allows a user to define which strategy they would like to use.

/cc @timliubentoml

Mar 02 '23 02:03 sauyon

Codecov Report

Merging #3630 (9db629e) into main (33c8440) will increase coverage by 31.85%. Report is 112 commits behind head on main. The diff coverage is 9.09%.

:exclamation: Current head 9db629e differs from pull request most recent head 56088fe. Consider uploading reports for the commit 56088fe to get more accurate results

@@            Coverage Diff             @@
##            main    #3630       +/-   ##
==========================================
+ Coverage   0.00%   31.85%   +31.85%     
==========================================
  Files        166      146       -20     
  Lines      15286    12038     -3248     
  Branches       0     1989     +1989     
==========================================
+ Hits           0     3835     +3835     
+ Misses     15286     7928     -7358     
- Partials       0      275      +275

Files Changed	Coverage Δ
src/bentoml/_internal/configuration/v1/__init__.py	`48.83% <ø> (+48.83%)`	:arrow_up:
src/bentoml/_internal/marshal/dispatcher.py	`0.00% <0.00%> (ø)`
src/bentoml/_internal/models/model.py	`77.59% <ø> (+77.59%)`	:arrow_up:
src/bentoml/_internal/server/runner_app.py	`0.00% <ø> (ø)`
src/bentoml/triton.py	`0.00% <ø> (ø)`
src/bentoml/_internal/runner/runner.py	`56.61% <66.66%> (+56.61%)`	:arrow_up:

... and 119 files with indirect coverage changes

Mar 02 '23 02:03 codecov[bot]

Oh, I'd forgotten about ruff. Man, it checks fast :sweat_smile:

Mar 02 '23 20:03 sauyon

This one is waiting on me to change some naming around, need to get to that.

Mar 10 '23 04:03 sauyon

Had some discussion about this PR with Sauyon. These are decisions:

adding back pressure handling logic to the new strategy
adjust the refactoring, move statistical regression into Intelligent Wait strategy.
move max_batch_size and max_latency into strategy_options

Mar 23 '23 06:03 bojiang

@bojiang this should be ok to look at for now, broad strokes.

Apr 18 '23 08:04 sauyon

I think this should be ready for review now if anybody wants to take a look (@bojiang I implemented wait time).

Once I add some tests I'll probably factor this into separate commits.

Apr 25 '23 03:04 sauyon

status: We probably want a load test before merging this one in.

May 09 '23 00:05 aarnphm

Is this likely to be reviewed and merged?

Aug 11 '23 08:08 judahrand

Hello @sauyon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file src/bentoml/_internal/marshal/dispatcher.py:

Line 72:80: E501 line too long (83 > 79 characters) Line 103:80: E501 line too long (101 > 79 characters) Line 131:80: E501 line too long (85 > 79 characters) Line 162:80: E501 line too long (84 > 79 characters) Line 213:80: E501 line too long (92 > 79 characters) Line 319:80: E501 line too long (88 > 79 characters) Line 476:80: E501 line too long (87 > 79 characters) Line 541:80: E501 line too long (82 > 79 characters) Line 558:80: E501 line too long (81 > 79 characters)

In the file src/bentoml/_internal/runner/runner.py:

Line 202:80: E501 line too long (107 > 79 characters) Line 203:80: E501 line too long (111 > 79 characters) Line 205:80: E501 line too long (110 > 79 characters) Line 267:80: E501 line too long (125 > 79 characters) Line 271:80: E501 line too long (86 > 79 characters) Line 274:80: E501 line too long (122 > 79 characters) Line 284:80: E501 line too long (80 > 79 characters) Line 286:80: E501 line too long (158 > 79 characters) Line 300:80: E501 line too long (83 > 79 characters)

Comment last updated at 2023-09-20 02:31:21 UTC

Sep 20 '23 01:09 pep8speaks

BentoML BentoML copied to clipboard

feat: implement batching strategies

Codecov Report

Comment last updated at 2023-09-20 02:31:21 UTC

BentoML
BentoML copied to clipboard