BentoML
BentoML copied to clipboard
fix: memory issue when push large bentos
What does this PR address?
supporting limit max memory usage when pushing models
bentoml push facebook--opt-2.7b-service:905a4b602cda5c501f1b3a2650a4152680238254 --maxmemory 2
Test case 1:
pushing bento google--flan-t5-large-service
, model size 2.92 GiB
- no limit
- time consumed: 3min 58s
- memory usage: ~ 3GB
- maxmemory = 1
- time consumed:4min 25s
- memory usage: <1G
Test case 2:
pushing bento google--flan-t5-large-service
, model size 12.55 GiB
- maxmemory = 3
- time consumed:4min 48s
- memory usage: max ~ 4G
Fixes #(issue)
Before submitting:
- [x] Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request.
- [x] Does the code follow BentoML's code style,
pre-commit run -a
script has passed (instructions)? - [x] Did you read through contribution guidelines and follow development guidelines?
- [ ] Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs.
- [ ] Did you write tests to cover your changes?