ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

[Feature request] Add support for HiDiffusion

Open MaxTran96 opened this issue 1 year ago • 7 comments

comfyUI-Manager/extension-node-map.json at main · ltdrdata/ComfyUI-Manager

if you can also look into Hyper SD as well, that would be nice

MaxTran96 avatar Apr 24 '24 03:04 MaxTran96

i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1

it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.

blepping avatar Apr 25 '24 23:04 blepping

i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1

it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.

doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?

patientx avatar Apr 26 '24 08:04 patientx

i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1 it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.

doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?

Using HiDiffusion, you can prevent the phenomenon of having eight limbs appear even at high resolutions like 2048x2048.

ltdrdata avatar Apr 26 '24 09:04 ltdrdata

i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1 it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.

doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?

Using HiDiffusion, you can prevent the phenomenon of having eight limbs appear even at high resolutions like 2048x2048.

I know about that but this implementation doesn't seem to work at all, neither it speeds up at normal resolutions or high resolutions nor it seem to enhance the quality at any setting. (used both 2,3 - 3,4,5 and also default values (that are meant to be for sd1.5) both doesn't make a difference with sdxl.

patientx avatar Apr 26 '24 09:04 patientx

@patientx

I know about that but this implementation doesn't seem to work at all, neither it speeds up at normal resolutions or high resolutions nor it seem to enhance the quality at any setting. (used both 2,3 - 3,4,5 and also default values (that are meant to be for sd1.5) both doesn't make a difference with sdxl.

my node only implements the MSW-MSA attention part, not the RAU-net part. i haven't done much testing with SDXL, i just checked that i still got reasonable results. with sd1.5 at least, the speed increase is about 30% at high resolution (i.e. 1536x1536) and it may help with artifacts but just the attention part doesn't let you generate natively at high resolution. you'll need to combine it with something like deep shrink.

the default sd1.5 settings aren't going to work properly with SDXL, SDXL doesn't even have an 11th block. hidiffusion's code has:

down_blocks.1.attentions.0.transformer_blocks.0
down_blocks.1.attentions.0.transformer_blocks.1
down_blocks.1.attentions.1.transformer_blocks.0
down_blocks.1.attentions.1.transformer_blocks.1
up_blocks.1.attentions.0.transformer_blocks.0
up_blocks.1.attentions.0.transformer_blocks.1
up_blocks.1.attentions.1.transformer_blocks.0
up_blocks.1.attentions.1.transformer_blocks.1
up_blocks.1.attentions.2.transformer_blocks.0
up_blocks.1.attentions.2.transformer_blocks.1

for sdxl, but that's the diffusers naming convention. i thought the conversion was first_number*3+second_number but that might be wrong. (down blocks are input, up blocks are output)

blepping avatar Apr 26 '24 11:04 blepping

i did some more testing with SDXL. from what i can see, it's probably not really worth using for SDXL at least from a performance standpoint. also the available blocks for SDXL seem to be

input: 4,5,7,8

output: 0,1,2,3,4,5

you can try using something like input 4,5, output 3,4,5 however even at pretty high res it's only a small speed increase. testing with deep shrink default settings, generating to 2560x1792 i get 3.47IT/s without the attention patch and 3.11IT/s with it.

the higher resolution you use, more steps, and slower the sampler (i.e. samplers that call the model multiple times like dpmpp_2s_ancestral or dpmpp_sde) the more benefit you'll see. my test was with dpmpp_2s_ancestral.

it does seem like it helps with artifacts at high res, however SDXL in general doesn't seem to tolerate deep shrink very well.

example workflow (should have metadata unless github strips it): attnworkflow

blepping avatar Apr 26 '24 13:04 blepping

experimental implementation of the remaining parts: https://github.com/blepping/comfyui_jankhidiffusion

i strongly recommend reading the README since there are more than a few gotchas.

blepping avatar Apr 27 '24 14:04 blepping

thank you! We can close this out

MaxTran96 avatar May 14 '24 19:05 MaxTran96