StreamDiffusion icon indicating copy to clipboard operation
StreamDiffusion copied to clipboard

No MPS support right?

Open rmasiso opened this issue 1 year ago • 11 comments

Just to be clear, this repo is for CUDA enabled devices only, correct? On initially testing, mps doesn't seem to work.

rmasiso avatar Dec 23 '23 16:12 rmasiso

Yes, that is correct.

MPS is not supported. However, if we can further speed up the process using MPS, we will try it.

If you know anything about it, We would appreciate your advice.

teftef6220 avatar Dec 23 '23 16:12 teftef6220

In case someone wondering for a start or need a project tryout on their Mac machine.

To run image to image or text to image from the readme example without acceleration:

pipe.enable_xformers_memory_efficient_attention()  # <-- NADA, remove/comment this

and pipe the model to "mps":

pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("mps"),
    dtype=torch.float16,
)

I'm not sure about xformers, I'm not an expert, but check the issue as it might be not needed.

Had to modify the class StreamDiffusion __call__ method in a pipeline to conditionally run cuda events wrapping...

Somewhere in .../StreamDiffusion/venv/lib/python3.xx/site-packages/streamdiffusion/pipeline.py if installed into venv via pip install . from the repo root...

   @torch.no_grad()
   # condition hack event sync/track for non-cuda devices, RIP profiling etc
   def __call__(
       self, x: Union[torch.Tensor, PIL.Image.Image, np.ndarray] = None
   ) -> torch.Tensor:
       if self.device == "cuda":
           start = torch.cuda.Event(enable_timing=True)
           end = torch.cuda.Event(enable_timing=True)
           start.record()
       if x is not None:
           x = self.image_processor.preprocess(x, self.height, self.width).to(
               device=self.device, dtype=self.dtype
           )
           if self.similar_image_filter:
               x = self.similar_filter(x)
               if x is None:
                   time.sleep(self.inference_time_ema)
                   return self.prev_image_result
           x_t_latent = self.encode_image(x)
       else:
           # TODO: check the dimension of x_t_latent
           x_t_latent = torch.randn((1, 4, self.latent_height, self.latent_width)).to(
               device=self.device, dtype=self.dtype
           )
       x_0_pred_out = self.predict_x0_batch(x_t_latent)
       x_output = self.decode_image(x_0_pred_out).detach().clone()

       self.prev_image_result = x_output
       if self.device == "cuda":
           end.record()
           torch.cuda.synchronize()
           inference_time = start.elapsed_time(end) / 1000
           self.inference_time_ema = 0.9 * self.inference_time_ema + 0.1 * inference_time
       return x_output

leezenn avatar Dec 26 '23 01:12 leezenn

@leezenn Thanks for the suggestion. How did you install streamdiffusion library? I guess in installation Step 3, we need to remove [tensorrt] right? Do we need to do extra steps?

ifsheldon avatar Dec 29 '23 10:12 ifsheldon

@ifsheldon I've installed it via pip install . git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt] <- didn't work AFAIR.

Here are all steps I performed at the project root (you can copy and execute this shell script) (outdated read till the very end first):

python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install wheel
pip install xformers
pip install accelerate

pip install .

deactivate

Step 3, we need to remove [tensorrt] right?

I'm not sure though. Do you? (UPD) Yeah, as it doesn't work. No Nvidia, RIP.

As for the demo server I have changed it to sfast in the config at .../StreamDiffusion/demo/realtime-txt2img/server/config.py:

    # ...
    device: torch.device = torch.device("mps")
    # ...
    acceleration: Literal["none", "xformers", "tensorrt"] = "sfast"
    # ...

and run:

source venv/bin/activate
cd demo/realtime-txt2img/

pip install -r requirements.txt 
cd view && npm install && npm run build && cd ..
cd server && python main.py

deactivate

cd ../../../

Note:

~~I had to install xformers with preinstalled wheel (installation fails without it) - check the installation steps above - in order for it to work. OR it was the accelerate I don't remember at this point. 😮‍💨~~


Nevermind, just

  • (Re)installed everything without xformers and (optional) accelerate:
python -m venv venv
source venv/bin/activate

pip install --upgrade pip

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
# pip install wheel
# pip install xformers
# pip install accelerate

pip install .

deactivate

For the Torch I've used the Apple guide.

leezenn avatar Dec 29 '23 15:12 leezenn

@leezenn Thanks a lot! I've successfully run it. But I wonder if you can run it with sfast? I don't know what it is. I cannot find it anywhere, in code or on Pypi.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

ifsheldon avatar Jan 02 '24 11:01 ifsheldon

@ifsheldon Sorry for the delay.

from sfast.compilers.stable_diffusion_pipeline_compiler import CompilationConfig, compile this seems to import something from nowhere.

I know right? :)

I'm glad you're confused too as I am.

I haven't dig into this much, but yeah, repo is missing the compiler part. Quick search oh Github gives me this from this repo. I didn't spend time investigating what it does. I just used the tip from the docstring, which contains it, to try it out. And it just silently runs (it may log on another level, I don't know), then I saw inconsistency with docstrings. So... this is not well coocked (yet?). I just left it be. I wasn't particulary patient with it, I'm sorry.

The project seem to be promising tho. ❤️

leezenn avatar Jan 09 '24 03:01 leezenn

@leezenn @ifsheldon I'm not sure If I set everything up in the correct way, but I at least got it working after following your conversation.

I was wondering what kind of speed you are getting from this? Running the txt2img demo, for me takes around 5-10 seconds till it starts producing images, then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input.

Im on a M3-Pro 36GB - expecting real-time generation will just stay a far away dream I guess?

odonald avatar Jan 16 '24 17:01 odonald

@odonald

for me takes around 5-10 seconds till it starts producing images

Most likely due to so called warmup runs.

then it shoots out images every 1-2 seconds and then again takes approx 10 seconds after new input

Similar effect on my M1Pro. As far as I remember, it was running on GPU, but had some problems with uRAM, probably even memory leak. So, it started to hit SSD swap and crawl instead of running... I haven't investigate any further/closer after a couple of runs - don't take my words seriously, it's a surface look assumptions.

Don't think your dream is far away thought. I saw some projects that was redesigned for the Apple Silicon machines series, somewhere along ggerganov et al with their special tools. Something like this project...

So, you can try and adapt current project using it or wait till someone will do that. If maintainers will constantly care to improve this project, I believe that somebody eventually come to make a proper MPS support unless there is a better alternative.

leezenn avatar Jan 16 '24 21:01 leezenn

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

ethrx avatar Apr 18 '24 17:04 ethrx

I've created this gist to help guide the setup and running the demos.

fbarretto avatar Apr 24 '24 21:04 fbarretto

@leezenn did you PR the hack? Tbh works perfectly on my M1 Pro, no performance decrease or aforementioned issues of lag/delay.

I did not. Maybe someone else did.

leezenn avatar Apr 25 '24 11:04 leezenn