ComfyUI
ComfyUI copied to clipboard
[Feature request] Add support for HiDiffusion
comfyUI-Manager/extension-node-map.json at main · ltdrdata/ComfyUI-Manager
if you can also look into Hyper SD as well, that would be nice
i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1
it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.
i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1
it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.
doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?
i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1 it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.
doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?
Using HiDiffusion, you can prevent the phenomenon of having eight limbs appear even at high resolutions like 2048x2048.
i made a node with the MSW-MSA attention part: https://gist.github.com/blepping/02e389f660112097983684a8ea8093b1 it's a noticeable speed increase for high res generation and seems like it might also increase quality/reduce artifacts.
doesn't seem to do any difference with sdxl or is it me ? Are we supposed connect this after checkpoint and enter higher resolution numbers ? Or does it supposed to even make the default resolution faster ?
Using HiDiffusion, you can prevent the phenomenon of having eight limbs appear even at high resolutions like 2048x2048.
I know about that but this implementation doesn't seem to work at all, neither it speeds up at normal resolutions or high resolutions nor it seem to enhance the quality at any setting. (used both 2,3 - 3,4,5 and also default values (that are meant to be for sd1.5) both doesn't make a difference with sdxl.
@patientx
I know about that but this implementation doesn't seem to work at all, neither it speeds up at normal resolutions or high resolutions nor it seem to enhance the quality at any setting. (used both 2,3 - 3,4,5 and also default values (that are meant to be for sd1.5) both doesn't make a difference with sdxl.
my node only implements the MSW-MSA attention part, not the RAU-net part. i haven't done much testing with SDXL, i just checked that i still got reasonable results. with sd1.5 at least, the speed increase is about 30% at high resolution (i.e. 1536x1536) and it may help with artifacts but just the attention part doesn't let you generate natively at high resolution. you'll need to combine it with something like deep shrink.
the default sd1.5 settings aren't going to work properly with SDXL, SDXL doesn't even have an 11th block. hidiffusion's code has:
down_blocks.1.attentions.0.transformer_blocks.0
down_blocks.1.attentions.0.transformer_blocks.1
down_blocks.1.attentions.1.transformer_blocks.0
down_blocks.1.attentions.1.transformer_blocks.1
up_blocks.1.attentions.0.transformer_blocks.0
up_blocks.1.attentions.0.transformer_blocks.1
up_blocks.1.attentions.1.transformer_blocks.0
up_blocks.1.attentions.1.transformer_blocks.1
up_blocks.1.attentions.2.transformer_blocks.0
up_blocks.1.attentions.2.transformer_blocks.1
for sdxl, but that's the diffusers naming convention. i thought the conversion was first_number*3+second_number but that might be wrong. (down blocks are input, up blocks are output)
i did some more testing with SDXL. from what i can see, it's probably not really worth using for SDXL at least from a performance standpoint. also the available blocks for SDXL seem to be
input: 4,5,7,8
output: 0,1,2,3,4,5
you can try using something like input 4,5, output 3,4,5 however even at pretty high res it's only a small speed increase. testing with deep shrink default settings, generating to 2560x1792 i get 3.47IT/s without the attention patch and 3.11IT/s with it.
the higher resolution you use, more steps, and slower the sampler (i.e. samplers that call the model multiple times like dpmpp_2s_ancestral or dpmpp_sde) the more benefit you'll see. my test was with dpmpp_2s_ancestral.
it does seem like it helps with artifacts at high res, however SDXL in general doesn't seem to tolerate deep shrink very well.
example workflow (should have metadata unless github strips it):
experimental implementation of the remaining parts: https://github.com/blepping/comfyui_jankhidiffusion
i strongly recommend reading the README since there are more than a few gotchas.
thank you! We can close this out