Kandinsky-2 Fixes to run on CPU and MPS

Some changes required to run it on CPU and other devices.

Apr 06 '23 15:04 WojtekKowaluk

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

Apr 17 '23 11:04 clarklight

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.

Apr 17 '23 17:04 trolley813

I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA

As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.

Thank you, i managed to get it to work, yer, was trying to push it onto run on CPU, and in the end, the time it takes on the M1 Mac is too crazy, around 40 mins to process. I read on the other thread that it error out due to low gRam on 1070Ti, i ran it on my other window's laptop it also error out due to low gRam. Just dropping down the notes for anyone else that read this thread.

Apr 18 '23 03:04 clarklight

@WojtekKowaluk , thank you for this fix

@clarklight , I've tested on M1 SoC 16GB as well and it achieves 8-10 seconds per iteration in my case, but you can try to use an mps device to enable GPU acceleration on that SoC. I've got improve up to 3 seconds per iteration - three times faster with mps

Apr 21 '23 19:04 CoruNethron

@CoruNethron

To run this on the Mac, i have to use CPU right, because there is no Cuda on the GPU? I just tested it again running on the CPU, i just tried it again still 120 second per it. Here is the test code, i changed it to run with the CPU. Am i doing anything incorrectly?

from kandinsky2 import get_kandinsky2 model = get_kandinsky2('cpu', task_type='text2img', cache_dir='/tmp/kandinsky2', model_version='2.1', use_flash_attention=False) images = model.generate_text2img( "red cat, 4k photo", num_steps=25, batch_size=1, guidance_scale=4, h=768, w=768, sampler='p_sampler', prior_cf_scale=4, prior_steps="5" )

Apr 22 '23 02:04 clarklight

@clarklight there is no CUDA support in GPU, that's correct. But there is support for another acceleration on the GPU, that's mps, and it can utilize Mac silicon GPU with torch. So, just change cpu to mps as you previously changed cuda to cpu and it should do the trick. I've got about 3 times faster rendering. Also FYI, it takes about 1.25 seconds per iteration on my machine, when resolution is set to 512 by 512. Even faster, that stable diffusion.

Apr 22 '23 03:04 CoruNethron

@CoruNethron Sweet, thank you! I got it to work! Yes its around 1.3second/it, but the outputted image are not images haha i will try to figure out why.

Apr 22 '23 03:04 clarklight

@clarklight I took some ideas about image export with unique file name here: https://gist.github.com/FurkanGozukara/10bdc0435b708b26bd87a59b6c3d1bc7

Apr 22 '23 04:04 CoruNethron

@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine... boat

Apr 22 '23 05:04 clarklight

I'm getting the following error, if I try to use img2img with mps:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/maxnowack/code/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/maxnowack/code/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/maxnowack/code/kubin/src/models/model_kd2.py", line 125, in i2i
    current_batch = self.kandinsky.generate_img2img(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 466, in generate_img2img
    image = q_sample(
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/utils.py", line 52, in q_sample
    _extract_into_tensor(sqrt_alphas_cumprod, t, x_start.shape) * x_start
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/model/utils.py", line 18, in _extract_into_tensor
    res = torch.from_numpy(arr).to(device=timesteps.device)[timesteps].float()
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

May 19 '23 21:05 maxnowack

I have fixed that one, but still getting other errors with img2img

Traceback (most recent call last):
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 127, in i2i
    current_batch = self.kandinsky.generate_img2img(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 474, in generate_img2img
    return self.generate_img(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 277, in generate_img
    samples, _ = sampler.sample(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 178, in sample
    self.make_schedule(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 104, in make_schedule
    "betas", to_torch(torch.from_numpy(self.old_diffusion.betas))
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 101, in <lambda>
    to_torch = lambda x: x.clone().detach().to(torch.float32).to("cuda")
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enable

after I change sampler to p_sampler I get another one:

Traceback (most recent call last):
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
    return generate_fn(params)
  File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
    i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
  File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 141, in i2i
    saved_batch = save_output(self.output_dir, 'img2img', current_batch, params)
  File "/Users/wojtek/Documents/kubin/src/utils/file_system.py", line 38, in save_output
    params_as_json = None if params is None else json.dumps(params, skipkeys=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Image is not JSON serializable

May 20 '23 03:05 WojtekKowaluk

There are still some hardcoded references to cuda in the samplers. I think a solution might be to pass the configured device to the samplers and use that instead of cuda. I'm quite inexperienced with pytorch, so I'm not sure what the implications of this might be.

May 20 '23 10:05 maxnowack

I have fixed samplers, for JSON error I have fixed it here: https://github.com/seruva19/kubin/pull/80

May 20 '23 13:05 WojtekKowaluk

@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine...

is this plms_sampler? I think that one is broken. ddim_sampler and p_sampler should work fine :)

May 24 '23 04:05 WojtekKowaluk

For mac, MPS can be used. I've also created a pull request to handle mps (https://github.com/ai-forever/Kandinsky-2/pull/101/commits/69759df490dc8fbaab0e4846428eef0721b53f9a)

    if torch.cuda.is_available():
        device = "cuda"
    elif torch.backends.mps.is_available():
        device = "mps"
    else:
        device = "cpu"

Apr 05 '24 11:04 ahmad88me

Kandinsky-2 Kandinsky-2 copied to clipboard

Fixes to run on CPU and MPS

Kandinsky-2
Kandinsky-2 copied to clipboard