diffusers
diffusers copied to clipboard
Best Way To Load Multiple Fine-Tuned Models?
Describe the bug
I am trying to load upto 4 fine-tuned models using pipeline,
Here is what my code looks like,
pipe1 = StableDiffusionPipeline.from_pretrained(model_path1, revision="fp16", torch_dtype=torch.float16)
pipe2 = StableDiffusionPipeline.from_pretrained(model_path2, revision="fp16", torch_dtype=torch.float16)
pipe3 = StableDiffusionPipeline.from_pretrained(model_path3, revision="fp16", torch_dtype=torch.float16)
pipe4 = StableDiffusionPipeline.from_pretrained(model_path4, revision="fp16", torch_dtype=torch.float16)
And as you have imagined, it's painfully slow and memory inefficient way to do.
How can i load multiple fine-tuned models like custom_pipeline="stable_diffusion_mega"
Or Pass array path and it loads memory efficiently?
Or Like,
img2text = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
img2img = StableDiffusionImg2ImgPipeline(**img2text.components)
inpaint = StableDiffusionInpaintPipeline(**img2text.components)
Reproduction
No response
Logs
No response
System Info
diffusers 6.0 Python 3.1
Try using joblib
and cache each pipe to disk. It will cache the pipe, model, and plotted memory to disk, and they load in ~2 seconds (1second on my system, 2 seconds on colabs). So switching to each type is very quick.
Example:
joblib.dump(StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_auth_token=True).to(device), f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
and
pipe = joblib.load(f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
Good question - at the moment the most memory efficient and fastest way to load the models is via devcice_map="auto"
:
pipe1 = StableDiffusionPipeline.from_pretrained(model_path1, revision="fp16", torch_dtype=torch.float16, device_map="auto")
pipe2 = StableDiffusionPipeline.from_pretrained(model_path2, revision="fp16", torch_dtype=torch.float16, device_map="auto")
pipe3 = StableDiffusionPipeline.from_pretrained(model_path3, revision="fp16", torch_dtype=torch.float16, device_map="auto")
pipe4 = StableDiffusionPipeline.from_pretrained(model_path4, revision="fp16", torch_dtype=torch.float16, device_map="auto")
This should cut loading time by 80-90% . Soon we'll have this enabled by default (cc @patil-suraj )
@adhikjoshi ,
If you only fine-tuned the unet
, then I would say the second option would be much efficient along with the device_map=auto
. For example if you have multiple fine-tune unets
pipe1 = StableDiffusionPipeline.from_pretrained(model_path1, revision="fp16", torch_dtype=torch.float16, device_map="auto")
unet_2 = UNet2DConditionalModel.from_pretrained(model_path2, subfolder="unet", revision="fp16", torch_dtype=torch.float16, device_map="auto")
components = pipe1.components
components["unet"] = unet_2
pipe2 = StableDiffusionPipeline(**components)
This will avoid loading the vae and text encoder multiple times since they are fixed.
This should cut loading time by 80-90% . Soon we'll have this enabled by default (cc @patil-suraj )
Wouldn't that be a bad idea since it's not compatible with CPU devices, and likely others? I think it's not compatible with half floats too like LayerNormKernelImpl
It is compatible with CPU devices. Feel free to try it out
Well, I have, hence the error-ing method I cited which gets triggered, which cannot be a half-float on CPU. There are lot of quirks with using CPU with diffusers right now, such as severely hindered performance (50-65% of CPU usage only, alternating back and forth).
Try using
joblib
and cache each pipe to disk. It will cache the pipe, model, and plotted memory to disk, and they load in ~2 seconds (1second on my system, 2 seconds on colabs). So switching to each type is very quick.Example:
joblib.dump(StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_auth_token=True).to(device), f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
and
pipe = joblib.load(f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
What's your configuration of hardware/system which allow you to load in 1s?
CPU/Ram/nvme?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Try using
joblib
and cache each pipe to disk. It will cache the pipe, model, and plotted memory to disk, and they load in ~2 seconds (1second on my system, 2 seconds on colabs). So switching to each type is very quick. Example:joblib.dump(StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_auth_token=True).to(device), f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
and
pipe = joblib.load(f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
What's your configuration of hardware/system which allow you to load in 1s?
CPU/Ram/nvme?
No clue what colabs uses as CPUs, but with Pro acc it gives you 64gb RAM. No clue about storage either. It's about the same on my system though (besides the whole checking model files online), which is a i9 13th 13900kf / 32GB RAM / 6200 RPM HDD
This method is no longer valid across Python versions/sessions/envs due to the diffusers_modules
fake class thing applied from https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/init.py#L77
Due to this, when you load a cached pipe in a new Python session, it throws a exception that it can't find the class file diffusers_modules
which it will never be able to find, cause it doesn't exist and is used internally by diffusers. They will load up in the same session though, which still offers some speed.
Unfortunate that it's no longer viable, as diffusers weight folders etc are obtrusive and linear loading of individual files from I/O much slower than a single file like a ckpt or cached obj.
Try using
joblib
and cache each pipe to disk. It will cache the pipe, model, and plotted memory to disk, and they load in ~2 seconds (1second on my system, 2 seconds on colabs). So switching to each type is very quick. Example:joblib.dump(StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_auth_token=True).to(device), f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
and
pipe = joblib.load(f'{PIPE_CACHE}/LOW_VRAM_PIPE.obj')
What's your configuration of hardware/system which allow you to load in 1s?
CPU/Ram/nvme?
No clue what colabs uses as CPUs, but with Pro acc it gives you 64gb RAM. No clue about storage either. It's about the same on my system though (besides the whole checking model files online), which is a i9 13th 13900kf / 32GB RAM / 6200 RPM HDD
This method is no longer valid across Python versions/sessions/envs due to the
diffusers_modules
fake class thing applied from https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/init.py#L77Due to this, when you load a cached pipe in a new Python session, it throws a exception that it can't find the class file
diffusers_modules
which it will never be able to find, cause it doesn't exist and is used internally by diffusers. They will load up in the same session though, which still offers some speed.Unfortunate that it's no longer viable, as diffusers weight folders etc are obtrusive and linear loading of individual files from I/O much slower than a single file like a ckpt or cached obj.
I've tried to use pickle and it's faster than joblib
Also, i load different session outside of pickle which is overwrites original session.
Though it's bit more but there's no way out.
The issue with pickle is it doesn't have memory mapping, for the objects in RAM/VRAM. Joblib takes the entire state of an object and caches it. Good for production dev, and debugging across systems.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.