Extremely slow generation
Hi, my gpu is gtx 1660 (6 gb) and while using ella my speed drop from 1.5it/s to 5s/it, seems like cuda cores are almost not being used and my CPU does most of the calculations instead
Can I take a look at your workflow?
.
I've got a similar problem too. But mine wasn't the KSampler that was taking too much time, mine was the ELLA Text Encode.
Im using an entry gaming laptop with specs of:
- Ryzen5 3550h
- gtx 1650(4gb)
- 24gb ram Same with OP, it uses the cpu instead when encoding. I dont know if that how suppose to work though as Im have very limited programming background. Thanks
@jcatsuki I'm not sure if you are still interested in this. But ELLA is indeed utilizing CPU instead of GPU when encoding text unless one of this condition applies:
- ComfyUI state that you have
NORMAL_VRAMorHIGH_VRAM(in which I will assume so since it will do that with shared memory), and you have a GPU that works with FP16 (16xx series are not one of them according to ComfyUI's code) - you forcibly tell ComfyUI to only use GPU via
--gpu-onlyflag, but that might slow down the diffusion process by a lot if you don't have enough VRAM.
An alternative that works for me but require a little bit of hacky code editing is to edit the model.py in ComfyUI-ELLA directory like so:
this is from roughly line 118, remove the model_management.text_encoder_device() to model_management.get_torch_device() that function exist in ComfyUI and will try to select any acceleration device available.
class T5TextEmbedder:
def __init__(self, pretrained_path="google/flan-t5-xl", max_length=None, dtype=None, legacy=True):
- self.load_device = model_management.text_encoder_device()
+ self.load_device = model_management.get_torch_device()
and on roughly line 312:
class ELLA:
def __init__(self, path: str, **kwargs) -> None:
- self.load_device = model_management.text_encoder_device()
+ self.load_device = model_management.get_torch_device()
This might not be the most elegant solution but it sure does works well for me, reducing down the encoding time from 6 minutes down to just a couple of seconds.
IMO, there should be an option in ELLA node to either use GPU when available, seperate from ComfyUI's decision, and force GPU or CPU. I will make a pull-request if I make the change.
@Chanakan5591 做得好!你是我的英雄!