降低显存的方法 (供参考)
在下面函数(ComfyUI-IDM-VTON\src\nodes\pipeline_loader.py), def load_pipeline(self, weight_dtype):
修改两点: 修改1> 把所有的 .to(DEVICE) ,全部注释掉,所有的。
修改2> 函数结尾处
修改前:
pipe.unet_encoder = unet_encoder
pipe = pipe.to(DEVICE)
pipe.weight_dtype = weight_dtype
修改为:
在显卡12G测试,完全无压力。查看显存占用大概6G多点,估计在8G下也能跑。
def load_pipeline(self, weight_dtype):
if weight_dtype == "float32":
weight_dtype = torch.float32
elif weight_dtype == "float16":
weight_dtype = torch.float16
elif weight_dtype == "bfloat16":
weight_dtype = torch.bfloat16
noise_scheduler = DDPMScheduler.from_pretrained(
WEIGHTS_PATH,
subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
WEIGHTS_PATH,
subfolder="vae",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
unet = UNet2DConditionModel.from_pretrained(
WEIGHTS_PATH,
subfolder="unet",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
WEIGHTS_PATH,
subfolder="image_encoder",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
unet_encoder = UNet2DConditionModel_ref.from_pretrained(
WEIGHTS_PATH,
subfolder="unet_encoder",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
text_encoder_one = CLIPTextModel.from_pretrained(
WEIGHTS_PATH,
subfolder="text_encoder",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
text_encoder_two = CLIPTextModelWithProjection.from_pretrained(
WEIGHTS_PATH,
subfolder="text_encoder_2",
torch_dtype=weight_dtype
).requires_grad_(False).eval()#.to(DEVICE)
tokenizer_one = AutoTokenizer.from_pretrained(
WEIGHTS_PATH,
subfolder="tokenizer",
revision=None,
use_fast=False,
)
tokenizer_two = AutoTokenizer.from_pretrained(
WEIGHTS_PATH,
subfolder="tokenizer_2",
revision=None,
use_fast=False,
)
pipe = TryonPipeline.from_pretrained(
WEIGHTS_PATH,
unet=unet,
vae=vae,
feature_extractor=CLIPImageProcessor(),
text_encoder=text_encoder_one,
text_encoder_2=text_encoder_two,
tokenizer=tokenizer_one,
tokenizer_2=tokenizer_two,
scheduler=noise_scheduler,
image_encoder=image_encoder,
torch_dtype=weight_dtype,
)
pipe.weight_dtype = weight_dtype
pipe.unet_encoder = unet_encoder
pipe.enable_sequential_cpu_offload()
pipe.unet_encoder.to(DEVICE)
#pipe.to(DEVICE)
#
return (pipe, )
Wow that's awesome! Thanks! Could you open a PR with these changes?
速度会变慢吗
Does this work?
My 4090 (with 24G vRAM) still OOM, :-( Anybody help? :-)
Works without problem.
Wow that's awesome! Thanks! Could you open a PR with these changes?
ok,i have submitted for review. and i think It's better to add the lowvram option , so.
大佬,为什么我照你说的改了代码,还是出现内存不足的报错啊
Thanks for sharing this tip, it works fine.
Wow awesome, thank you so much for this finding! Could you create a PR for this?
升级以后这个方法报错,vton无法导入了,请问该如何修改?
it works,老铁
大佬太牛逼了,我搞了一周,每次都是爆显存,我还是双4070tis 16G显卡,没想到这么简单就解决了。提醒下修改后找不到idm-vton节点的兄弟们,换的那几行缩进 不能用tab键,要用空格,我被这个问题也卡了1小时