Dreambooth-Stable-Diffusion
Dreambooth-Stable-Diffusion copied to clipboard
This is a fix to get stable_txt2img working on an M1 Mac.
- Install Stable Diffusion as per https://medium.com/gft-engineering/macbook-m1-how-to-install-and-run-stable-diffusion-7bfb2f802b1a
- Install PyTorch by running :
-
conda install pytorch torchvision torchaudio -c pytorch-nightly
Running with more than one sample seems to break it so I'm just running multiple itterations to get the regularization images:
python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 1 --n_iter 200 --scale 10.0 --ddim_steps 50 --ckpt ~/Downloads/sd-v1-4-full-ema.ckpt --prompt "a photo of a <class>
How long is it taking you to train the models this way?
I can't get this repo (not the lstein one mentionned by OP) to train on M1. I was able to patch my way out until i didn't get any visible errors, but inevitably got stuck on training never progressing (epoch 0)
@swankwc I haven't gotten to training yet. ATM his patch is just forstable_txt2img.py
which took 1484.16s user 6134.54s system 24% cpu 8:30:55.51 total
.
I'm having problems getting main.py
to run. Even if I comment out all the CUDA code, and change the Trainer to MPS I'm still getting a CUDA error in trainer.fit
Traceback (most recent call last):
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 836, in <module>
trainer.fit(model, data)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
self._run_sanity_check()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
val_loop.run()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
output = self._evaluation_step(**kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
_, loss_dict_no_ema = self.shared_step(batch)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
loss = self(x, c)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
c = self.get_learned_conditioning(c)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
return self(text, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 318, in forward
tokens = batch_encoding["input_ids"].to(self.device)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
I can keep using this PR to track that work or open a new one. Opinions?
@Sorrow do you have your work somewhere? You seem to have gotten further than me. Maybe we can collaborate. Here's my WIP, it's very rough ATM https://github.com/beettlle/Dreambooth-Stable-Diffusion/tree/m1-training-fix
Renamed PR explain scope of work better.
@beettlle, I've been able to get it up and running on my Macbook Pro with some modifications using your code. It's linked here if you'd like to take a look: https://github.com/SujeethJinesh/DreamBoothMac
That's awesome @SujeethJinesh ! Let me reset my env and I'll try it tomorrow.
@SujeethJinesh I'm still getting the following error with your branch. Any ideas?
% python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n ramona --gpus 0, --data_root ~/Downloads/ramona --reg_data_root outputs/txt2img-samples --class_word ramona
<gobs and gobs of stuff>
Traceback (most recent call last):
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 806, in <module>
trainer.fit(model, data)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
self._run_sanity_check()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
val_loop.run()
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
output = self._evaluation_step(**kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
_, loss_dict_no_ema = self.shared_step(batch)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
loss = self(x, c)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
c = self.get_learned_conditioning(c)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
return self(text, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 319, in forward
z = self.transformer(input_ids=tokens, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 297, in transformer_forward
return self.text_model(
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 258, in text_encoder_forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, embedding_manager=embedding_manager)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 180, in embedding_forward
inputs_embeds = self.token_embedding(input_ids)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2206, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
is there any progress (in @SujeethJinesh's built)?
i cant even generate the regularization images on MPS as it doesnt support double precision floats, but sd requires them Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
I tried running @SujeethJinesh 's repo and got the same error as you @HannesGitH . Some additional things I was to install the following
conda install pytorch torchvision torchaudio -c pytorch-nightly
conda install chardet
My latest attempt to fix was to perform the cast as follows
class DDIMSampler(object):
def __init__(self, model, schedule="linear", **kwargs):
super().__init__()
self.model = model
self.ddpm_num_timesteps = model.num_timesteps
self.schedule = schedule
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != torch.device("mps"):
attr = attr.to(torch.device("mps"), torch.float32)
setattr(self, name, attr)
but that gave me this error
AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31' | 0/5 [00:00<?, ?it/s]
[1] 1493 abort python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1 10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
@SujeethJinesh open a PR https://github.com/SujeethJinesh/DreamBoothMac/pull/3 to fix the float64 error.
I was able to get around the error
AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31' | 0/5 [00:00<?, ?it/s]
[1] 1493 abort python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1 10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
I changed the size of the image to 256 x 256 that did the trick. It unblocks me for now, but it would be good to figure out a better solution. I will try to fix later.
In my latest attempt I tried to perform training
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n hello_world --gpus 0, --data_root ~/Downloads/couch_images --reg_data_root ~/Downloads/other_images/ --class_word couch_trainversion_314
I get the following error
pytorch_lightning.utilities.exceptions.MisconfigurationException: You passed `devices=1` but haven't specified `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu')` for the devices mapping, got `accelerator='mps'`.
My best guess is that the pytorch lighting version specified (1.5.9) does have this feature
https://lightning.ai/docs/pytorch/stable/accelerators/mps_basic.html
@SujeethJinesh how did you get this work?
@alberto-salinas would you mind trying the following from 's site in your environment to see if MPS is supported?
import torch
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones(1, device=mps_device)
print (x)
else:
print ("MPS device not found.")
Output should be:
tensor([1.], device='mps:0')