Kandinsky-2
Kandinsky-2 copied to clipboard
Fixes to run on CPU and MPS
Some changes required to run it on CPU and other devices.
I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA
I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA
As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.
I tried the above but i am having this error: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: CUDA
As far as I understand (from the error message), you wrote CUDA in uppercase in your code, while PyTorch expect lowercase naming.
Thank you, i managed to get it to work, yer, was trying to push it onto run on CPU, and in the end, the time it takes on the M1 Mac is too crazy, around 40 mins to process. I read on the other thread that it error out due to low gRam on 1070Ti, i ran it on my other window's laptop it also error out due to low gRam. Just dropping down the notes for anyone else that read this thread.
@WojtekKowaluk , thank you for this fix
@clarklight , I've tested on M1 SoC 16GB as well and it achieves 8-10 seconds per iteration in my case, but you can try to use an mps
device to enable GPU acceleration on that SoC. I've got improve up to 3 seconds per iteration - three times faster with mps
@CoruNethron
To run this on the Mac, i have to use CPU right, because there is no Cuda on the GPU? I just tested it again running on the CPU, i just tried it again still 120 second per it. Here is the test code, i changed it to run with the CPU. Am i doing anything incorrectly?
from kandinsky2 import get_kandinsky2 model = get_kandinsky2('cpu', task_type='text2img', cache_dir='/tmp/kandinsky2', model_version='2.1', use_flash_attention=False) images = model.generate_text2img( "red cat, 4k photo", num_steps=25, batch_size=1, guidance_scale=4, h=768, w=768, sampler='p_sampler', prior_cf_scale=4, prior_steps="5" )
@clarklight there is no CUDA support in GPU, that's correct. But there is support for another acceleration on the GPU, that's mps
, and it can utilize Mac silicon GPU with torch. So, just change cpu
to mps
as you previously changed cuda
to cpu
and it should do the trick. I've got about 3 times faster rendering. Also FYI, it takes about 1.25 seconds per iteration on my machine, when resolution is set to 512 by 512. Even faster, that stable diffusion.
@CoruNethron Sweet, thank you! I got it to work! Yes its around 1.3second/it, but the outputted image are not images haha i will try to figure out why.
@clarklight I took some ideas about image export with unique file name here: https://gist.github.com/FurkanGozukara/10bdc0435b708b26bd87a59b6c3d1bc7
@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine...
I'm getting the following error, if I try to use img2img with mps:
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
output = await app.get_blocks().process_api(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
prediction = await anyio.to_thread.run_sync(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/Users/maxnowack/code/kubin/src/ui_blocks/i2i.py", line 65, in generate
return generate_fn(params)
File "/Users/maxnowack/code/kubin/src/webui.py", line 28, in <lambda>
i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
File "/Users/maxnowack/code/kubin/src/models/model_kd2.py", line 125, in i2i
current_batch = self.kandinsky.generate_img2img(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 466, in generate_img2img
image = q_sample(
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/utils.py", line 52, in q_sample
_extract_into_tensor(sqrt_alphas_cumprod, t, x_start.shape) * x_start
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/kandinsky2/model/utils.py", line 18, in _extract_into_tensor
res = torch.from_numpy(arr).to(device=timesteps.device)[timesteps].float()
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
I have fixed that one, but still getting other errors with img2img
Traceback (most recent call last):
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
output = await app.get_blocks().process_api(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
return generate_fn(params)
File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 127, in i2i
current_batch = self.kandinsky.generate_img2img(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 474, in generate_img2img
return self.generate_img(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py", line 277, in generate_img
samples, _ = sampler.sample(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 178, in sample
self.make_schedule(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 104, in make_schedule
"betas", to_torch(torch.from_numpy(self.old_diffusion.betas))
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/kandinsky2/model/samplers.py", line 101, in <lambda>
to_torch = lambda x: x.clone().detach().to(torch.float32).to("cuda")
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enable
after I change sampler to p_sampler I get another one:
Traceback (most recent call last):
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/routes.py", line 412, in run_predict
output = await app.get_blocks().process_api(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1021, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/Users/wojtek/Documents/kubin/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/Users/wojtek/Documents/kubin/src/ui_blocks/i2i.py", line 65, in generate
return generate_fn(params)
File "/Users/wojtek/Documents/kubin/src/webui.py", line 28, in <lambda>
i2i_ui(generate_fn=lambda params: kubin.model.i2i(params), shared=ui_shared, tabs=ui_tabs)
File "/Users/wojtek/Documents/kubin/src/models/model_kd2.py", line 141, in i2i
saved_batch = save_output(self.output_dir, 'img2img', current_batch, params)
File "/Users/wojtek/Documents/kubin/src/utils/file_system.py", line 38, in save_output
params_as_json = None if params is None else json.dumps(params, skipkeys=True)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Image is not JSON serializable
There are still some hardcoded references to cuda
in the samplers. I think a solution might be to pass the configured device to the samplers and use that instead of cuda
. I'm quite inexperienced with pytorch, so I'm not sure what the implications of this might be.
I have fixed samplers, for JSON error I have fixed it here: https://github.com/seruva19/kubin/pull/80
@CoruNethron Most of my images are broken for some reason.....but if i run it on the web version, it runs fine...
is this plms_sampler? I think that one is broken. ddim_sampler and p_sampler should work fine :)
For mac, MPS can be used. I've also created a pull request to handle mps (https://github.com/ai-forever/Kandinsky-2/pull/101/commits/69759df490dc8fbaab0e4846428eef0721b53f9a)
if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"