bug: Win 11, Cuda, bento serve. Api service significantly slows SD generation.
Describe the bug
SD generation is incredibly slow running pytorch 1.13.1 + cuda 11.7 when I run Run BENTOML_CONFIG=configuration.yaml PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 bentoml serve service:svc --production locally.`
When I submit the request through postman, while its processing the request its like 100x slower than if I just cancel the request. Once the request is cancelled the generation time goes from 10min to about 10-20 seconds.
I can confirm that self.device = "cuda"
To reproduce
- BENTOML_CONFIG=configuration.yaml PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 bentoml serve service:svc --production
- Submit request to txt2img through postman
- observe that it takes a looong time
- cancel the request
- sd generation goes much faster.
I put image.save('./img.jpg') before the response so that I can test the image and it works.
Expected behavior
SD Should generate in less than 15 seconds for this hardware.
Environment
Environment variable
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''
System information
bentoml: 1.0.13
python: 3.10.5
platform: Windows-10-10.0.22621-SP0
is_window_admin: False
pip_packages
accelerate==0.15.0
aiohttp==3.8.3
aiosignal==1.3.1
anyio==3.6.2
appdirs==1.4.4
asgiref==3.6.0
async-timeout==4.0.2
attrs==22.2.0
backoff==2.2.1
bentoml==1.0.13
build==0.10.0
cattrs==22.2.0
certifi==2022.12.7
charset-normalizer==2.1.1
circus==0.18.0
click==8.1.3
click-option-group==0.5.5
cloudpickle==2.2.1
colorama==0.4.6
contextlib2==21.6.0
deepmerge==1.1.0
Deprecated==1.2.13
diffusers==0.11.1
exceptiongroup==1.1.0
fastapi==0.89.1
filelock==3.9.0
frozenlist==1.3.3
fs==2.4.16
ftfy==6.1.1
googleapis-common-protos==1.58.0
h11==0.14.0
huggingface-hub==0.11.1
idna==3.4
importlib-metadata==6.0.0
Jinja2==3.1.2
markdown-it-py==2.1.0
MarkupSafe==2.1.2
mdurl==0.1.2
multidict==6.0.4
numpy==1.24.1
opentelemetry-api==1.14.0
opentelemetry-exporter-otlp-proto-http==1.14.0
opentelemetry-instrumentation==0.35b0
opentelemetry-instrumentation-aiohttp-client==0.35b0
opentelemetry-instrumentation-asgi==0.35b0
opentelemetry-proto==1.14.0
opentelemetry-sdk==1.14.0
opentelemetry-semantic-conventions==0.35b0
opentelemetry-util-http==0.35b0
packaging==21.3
pathspec==0.10.3
Pillow==9.4.0
pip-requirements-parser==32.0.1
pip-tools==6.12.1
prometheus-client==0.15.0
protobuf==3.20.3
psutil==5.9.4
pydantic==1.10.4
Pygments==2.14.0
pynvml==11.4.1
pyparsing==3.0.9
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-json-logger==2.0.4
python-multipart==0.0.5
PyYAML==6.0
pyzmq==25.0.0
regex==2022.10.31
requests==2.28.2
rich==13.2.0
schema==0.7.5
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.22.0
tokenizers==0.13.2
tomli==2.0.1
torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchvision==0.14.1+cu117
tornado==6.2
tqdm==4.64.1
transformers==4.25.1
typing_extensions==4.4.0
urllib3==1.26.14
uvicorn==0.20.0
watchfiles==0.18.1
wcwidth==0.2.6
wrapt==1.14.1
yarl==1.8.2
zipp==3.11.0
Can you try it on the latest version of BentoML? Thanks