BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

feat: implement batching strategies

Open sauyon opened this issue 2 years ago • 9 comments

This adds a new configuration value, runner.batching.target_latency_ms, which controls how long the dispatcher will wait before beginning to execute requests.

Could probably do with a little bit of testing to see how setting it to 0 performs vs leaving as ~, but for now adding more knobs users can tweak is probably a good thing; I suspect there will be at least a few people who want the behavior of infinite max latency but not long wait times for requests after a burst.

EDIT: This PR has now been updated to provide a strategy option in the configuration, which allows a user to define which strategy they would like to use.

/cc @timliubentoml

sauyon avatar Mar 02 '23 02:03 sauyon

Codecov Report

Merging #3630 (9db629e) into main (33c8440) will increase coverage by 31.85%. Report is 112 commits behind head on main. The diff coverage is 9.09%.

:exclamation: Current head 9db629e differs from pull request most recent head 56088fe. Consider uploading reports for the commit 56088fe to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##            main    #3630       +/-   ##
==========================================
+ Coverage   0.00%   31.85%   +31.85%     
==========================================
  Files        166      146       -20     
  Lines      15286    12038     -3248     
  Branches       0     1989     +1989     
==========================================
+ Hits           0     3835     +3835     
+ Misses     15286     7928     -7358     
- Partials       0      275      +275     
Files Changed Coverage Δ
src/bentoml/_internal/configuration/v1/__init__.py 48.83% <ø> (+48.83%) :arrow_up:
src/bentoml/_internal/marshal/dispatcher.py 0.00% <0.00%> (ø)
src/bentoml/_internal/models/model.py 77.59% <ø> (+77.59%) :arrow_up:
src/bentoml/_internal/server/runner_app.py 0.00% <ø> (ø)
src/bentoml/triton.py 0.00% <ø> (ø)
src/bentoml/_internal/runner/runner.py 56.61% <66.66%> (+56.61%) :arrow_up:

... and 119 files with indirect coverage changes

codecov[bot] avatar Mar 02 '23 02:03 codecov[bot]

Oh, I'd forgotten about ruff. Man, it checks fast :sweat_smile:

sauyon avatar Mar 02 '23 20:03 sauyon

This one is waiting on me to change some naming around, need to get to that.

sauyon avatar Mar 10 '23 04:03 sauyon

Had some discussion about this PR with Sauyon. These are decisions:

  1. adding back pressure handling logic to the new strategy
  2. adjust the refactoring, move statistical regression into Intelligent Wait strategy.
  3. move max_batch_size and max_latency into strategy_options

bojiang avatar Mar 23 '23 06:03 bojiang

@bojiang this should be ok to look at for now, broad strokes.

sauyon avatar Apr 18 '23 08:04 sauyon

I think this should be ready for review now if anybody wants to take a look (@bojiang I implemented wait time).

Once I add some tests I'll probably factor this into separate commits.

sauyon avatar Apr 25 '23 03:04 sauyon

status: We probably want a load test before merging this one in.

aarnphm avatar May 09 '23 00:05 aarnphm

Is this likely to be reviewed and merged?

judahrand avatar Aug 11 '23 08:08 judahrand

Hello @sauyon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 72:80: E501 line too long (83 > 79 characters) Line 103:80: E501 line too long (101 > 79 characters) Line 131:80: E501 line too long (85 > 79 characters) Line 162:80: E501 line too long (84 > 79 characters) Line 213:80: E501 line too long (92 > 79 characters) Line 319:80: E501 line too long (88 > 79 characters) Line 476:80: E501 line too long (87 > 79 characters) Line 541:80: E501 line too long (82 > 79 characters) Line 558:80: E501 line too long (81 > 79 characters)

Line 202:80: E501 line too long (107 > 79 characters) Line 203:80: E501 line too long (111 > 79 characters) Line 205:80: E501 line too long (110 > 79 characters) Line 267:80: E501 line too long (125 > 79 characters) Line 271:80: E501 line too long (86 > 79 characters) Line 274:80: E501 line too long (122 > 79 characters) Line 284:80: E501 line too long (80 > 79 characters) Line 286:80: E501 line too long (158 > 79 characters) Line 300:80: E501 line too long (83 > 79 characters)

Comment last updated at 2023-09-20 02:31:21 UTC

pep8speaks avatar Sep 20 '23 01:09 pep8speaks