Dreambooth-Stable-Diffusion icon indicating copy to clipboard operation
Dreambooth-Stable-Diffusion copied to clipboard

This is a fix to get stable_txt2img working on an M1 Mac.

Open beettlle opened this issue 1 year ago • 14 comments

Running with more than one sample seems to break it so I'm just running multiple itterations to get the regularization images: python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 1 --n_iter 200 --scale 10.0 --ddim_steps 50 --ckpt ~/Downloads/sd-v1-4-full-ema.ckpt --prompt "a photo of a <class>

beettlle avatar Sep 27 '22 01:09 beettlle

How long is it taking you to train the models this way?

swankwc avatar Sep 28 '22 04:09 swankwc

I can't get this repo (not the lstein one mentionned by OP) to train on M1. I was able to patch my way out until i didn't get any visible errors, but inevitably got stuck on training never progressing (epoch 0)

Sorrow avatar Sep 28 '22 14:09 Sorrow

@swankwc I haven't gotten to training yet. ATM his patch is just forstable_txt2img.py which took 1484.16s user 6134.54s system 24% cpu 8:30:55.51 total.

I'm having problems getting main.py to run. Even if I comment out all the CUDA code, and change the Trainer to MPS I'm still getting a CUDA error in trainer.fit

Traceback (most recent call last):
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 836, in <module>
    trainer.fit(model, data)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
    output = self._evaluation_step(**kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
    output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
    _, loss_dict_no_ema = self.shared_step(batch)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
    loss = self(x, c)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
    c = self.get_learned_conditioning(c)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
    c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
    return self(text, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 318, in forward
    tokens = batch_encoding["input_ids"].to(self.device)        
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 221, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")

I can keep using this PR to track that work or open a new one. Opinions?

beettlle avatar Sep 28 '22 17:09 beettlle

@Sorrow do you have your work somewhere? You seem to have gotten further than me. Maybe we can collaborate. Here's my WIP, it's very rough ATM https://github.com/beettlle/Dreambooth-Stable-Diffusion/tree/m1-training-fix

beettlle avatar Sep 28 '22 17:09 beettlle

Renamed PR explain scope of work better.

beettlle avatar Sep 28 '22 20:09 beettlle

@beettlle, I've been able to get it up and running on my Macbook Pro with some modifications using your code. It's linked here if you'd like to take a look: https://github.com/SujeethJinesh/DreamBoothMac

SujeethJinesh avatar Nov 01 '22 05:11 SujeethJinesh

That's awesome @SujeethJinesh ! Let me reset my env and I'll try it tomorrow.

beettlle avatar Nov 02 '22 01:11 beettlle

@SujeethJinesh I'm still getting the following error with your branch. Any ideas?

% python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n ramona --gpus 0, --data_root ~/Downloads/ramona --reg_data_root outputs/txt2img-samples --class_word ramona
<gobs and gobs of stuff>
Traceback (most recent call last):
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 806, in <module>
    trainer.fit(model, data)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
    output = self._evaluation_step(**kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
    output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
    _, loss_dict_no_ema = self.shared_step(batch)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
    loss = self(x, c)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
    c = self.get_learned_conditioning(c)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
    c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
    return self(text, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 319, in forward
    z = self.transformer(input_ids=tokens, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 297, in transformer_forward
    return self.text_model(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 258, in text_encoder_forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, embedding_manager=embedding_manager)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 180, in embedding_forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2206, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

beettlle avatar Nov 02 '22 19:11 beettlle

is there any progress (in @SujeethJinesh's built)?

i cant even generate the regularization images on MPS as it doesnt support double precision floats, but sd requires them Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

HannesGitH avatar Dec 08 '22 10:12 HannesGitH

I tried running @SujeethJinesh 's repo and got the same error as you @HannesGitH . Some additional things I was to install the following

conda install pytorch torchvision torchaudio -c pytorch-nightly
conda install chardet

alberto-salinas avatar Apr 27 '23 05:04 alberto-salinas

My latest attempt to fix was to perform the cast as follows

class DDIMSampler(object):
    def __init__(self, model, schedule="linear", **kwargs):
        super().__init__()
        self.model = model
        self.ddpm_num_timesteps = model.num_timesteps
        self.schedule = schedule

    def register_buffer(self, name, attr):
        if type(attr) == torch.Tensor:
            if attr.device != torch.device("mps"):
                attr = attr.to(torch.device("mps"), torch.float32)
        setattr(self, name, attr)

but that gave me this error

AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'               | 0/5 [00:00<?, ?it/s]
[1]    1493 abort      python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1  10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

alberto-salinas avatar Apr 27 '23 05:04 alberto-salinas

@SujeethJinesh open a PR https://github.com/SujeethJinesh/DreamBoothMac/pull/3 to fix the float64 error.

I was able to get around the error

AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'               | 0/5 [00:00<?, ?it/s]
[1]    1493 abort      python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1  10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

I changed the size of the image to 256 x 256 that did the trick. It unblocks me for now, but it would be good to figure out a better solution. I will try to fix later.

alberto-salinas avatar Apr 27 '23 06:04 alberto-salinas

In my latest attempt I tried to perform training

 python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml  -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n hello_world --gpus 0, --data_root ~/Downloads/couch_images --reg_data_root ~/Downloads/other_images/ --class_word couch_trainversion_314

I get the following error

pytorch_lightning.utilities.exceptions.MisconfigurationException: You passed `devices=1` but haven't specified `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu')` for the devices mapping, got `accelerator='mps'`.

My best guess is that the pytorch lighting version specified (1.5.9) does have this feature

https://lightning.ai/docs/pytorch/stable/accelerators/mps_basic.html

@SujeethJinesh how did you get this work?

alberto-salinas avatar Apr 27 '23 07:04 alberto-salinas

@alberto-salinas would you mind trying the following from 's site in your environment to see if MPS is supported?

import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

Output should be: tensor([1.], device='mps:0')

beettlle avatar May 04 '23 01:05 beettlle