BentoML
BentoML copied to clipboard
fix: memory issue when push large bentos
What does this PR address?
supporting limit max memory usage when pushing models
bentoml push facebook--opt-2.7b-service:905a4b602cda5c501f1b3a2650a4152680238254 --maxmemory 2
Test case 1:
pushing bento google--flan-t5-large-service
, model size 2.92 GiB
- no limit
- time consumed: 3min 58s
- memory usage: ~ 3GB
- maxmemory = 1
- time consumed:4min 25s
- memory usage: <1G
Test case 2:
pushing bento google--flan-t5-large-service
, model size 12.55 GiB
- maxmemory = 3
- time consumed:4min 48s
- memory usage: max ~ 4G
Fixes #(issue)
Before submitting:
- [x] Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request.
- [x] Does the code follow BentoML's code style,
pre-commit run -a
script has passed (instructions)? - [x] Did you read through contribution guidelines and follow development guidelines?
- [ ] Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs.
- [ ] Did you write tests to cover your changes?
Hello @xianml! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2023-09-26 02:14:35 UTC
What are the bugs within the current implementation?
What are the bugs within the current implementation?
context in https://bentoml-team.slack.com/archives/C02QLC8RB5W/p1695088745009929
as a summary, currently it will take too much memory when do a bentoml push
since it use io.BytesIO. So this fix is to use SpooledTemporaryFile instead to cap the memory usage
Looks like unit tests are failing, maybe because of the requests change...?
Should we just use a SpooledTemporaryFile
for everything?
@sauyon you need to change the tests patch requests
to httpx
. Let's open a separately PR to fix the test?
Looks like unit tests are failing, maybe because of the requests change...?
Should we just use a
SpooledTemporaryFile
for everything?
- checked the ut failed logs, seems our test case is out of date
- now, i just use SpooledTemporaryFile for pushing models. Shall we replaced it one by one ? i am not very confident with a big code change.