onediff
onediff copied to clipboard
关于多分辨率大 batch 加速失败的问题
Describe the bug
在显卡 A10(24G 显存) 上,加速多分辨率,同时每个分辨率生成 2 张图片时,出现错误
Your environment
diffusers==0.27.0
transformers==4.38.2
xformers==0.0.23.post1
peft==0.7.1
# For CN users
python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu121
python3 -m pip install --pre onediff
git clone https://github.com/siliconflow/onediff.git
cd onediff_diffusers_extensions && python3 -m pip install -e .
How To Reproduce
Steps to reproduce the behavior(code or script):
import time
from PIL import Image
import oneflow as flow
import torch
from onediff.infer_compiler import oneflow_compile
from diffusers import LCMScheduler
from third_party.diffusers_mc.pipeline_stable_diffusion_xl_img2img import StableDiffusionXLImg2ImgPipeline
model_dir = 'ckpts/playground-v2'
# Model load and compile
pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
model_dir,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
)
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.safety_checker = None
pipeline.to('cuda', torch_dtype=torch.float16)
pipeline.unet = oneflow_compile(pipeline.unet)
pipeline.vae.decoder = oneflow_compile(pipeline.vae.decoder)
prompt = "a photo of an astronaut riding a horse on mars"
# Warm-up
warmup_sizes = [(1024, 1024)]
for size in warmup_sizes:
_ = pipeline(prompt=prompt, height=size[0], width=size[1])
# Normal inference
inference_sizes = [(1024, 1024), (512, 2048), (2048, 512)]
for size in inference_sizes:
start_time = time.time()
image = pipeline(
prompt=prompt,
height=size[0],
width=size[1],
num_inference_steps=4,
num_images_per_prompt=2,
strength=1.0,
).images[0]
end_time = time.time()
print('time:', end_time-start_time)
The complete error message
Stack trace (most recent call last) in thread 1644:
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc73a15f1f, in
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bc3f9a7, in
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bc3f21c, in
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bc3aa98, in vm::ThreadCtx::TryReceiveAndRun()
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bbdd234, in vm::EpStreamPolicyBase::Run(vm::Instruction*) const
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bbe0537, in vm::Instruction::Compute()
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bbe7918, in vm::OpCallInstructionPolicy::Compute(vm::Instruction*)
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bbe75e9, in
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc6bbe273a, in
Object "/opt/conda/lib/python3.10/site-packages/oneflow/../oneflow.libs/liboneflow-74324398.so", at 0x7fbc633e3d3c, in
Aborted (Signal sent by tkill() 1536 0)
Aborted (core dumped)
Additional context
但是如果不使用 onediff 进行加速的话,当 batch size 为 2 的时候可以正常运行
执行时观察下 gpu 显存占用看看,可能是 OOM 了。
执行时观察下 gpu 显存占用看看,可能是 OOM 了。
观察到使用 onediff 后显存占用确实变多了。这个有什么优化的办法吗?
执行时观察下 gpu 显存占用看看,可能是 OOM 了。
观察到使用 onediff 后显存占用确实变多了。这个有什么优化的办法吗?
可以参考这里: https://github.com/siliconflow/onediff/issues/605#issuecomment-1980574638
是因为显存池没有共享带来的,当前这个版本还没很好处理方法。我们计划在下个大版本解决下这个问题,不过需要点时间。
执行时观察下 gpu 显存占用看看,可能是 OOM 了。
观察到使用 onediff 后显存占用确实变多了。这个有什么优化的办法吗?
可以参考这里: #605 (comment)
是因为线程池没有共享带来的,当前这个版本还没很好处理方法。我们计划在下个大版本解决下这个问题,不过需要点时间。
ok,那我等下个新的版本吧。多谢了。
https://github.com/siliconflow/onediff/tree/main/onediff_diffusers_extensions/examples/sd3
@lovejing0306 请参考这个例子试用下 nexfort,这里显存池和torch 是复用的,几乎不增加显存