StoryDiffusion Using quantized version with the pipeline

Hello

I am trying to run the comic generation notebook but with quantized version to fit in my 8gb vram. using SSD-1B. However getting the below error :

The expanded size of the tensor (676) must match the existing size (2500) at non-singleton dimension 3. Target sizes: [2, 20, 676, 676]. Tensor sizes: [2500, 2500]

while running the line :

id_images = pipe(id_prompts, num_inference_steps = num_steps, guidance_scale=guidance_scale, height = height, width = width,negative_prompt = negative_prompt,generator = generator).images

Can you help solve this ?

Thanks

May 05 '24 08:05 allthatido

We are glad to solve your problem, I am not familiar with SSD-1B. Maybe need some time. I am expected to update the code in 1-2 days.

May 06 '24 02:05 Z-YuPeng

SSD-1B is a quantized version of SDXL where the precision on the weights are reduced from higher precision float point 16 / 32 to 1 Bit weights. This makes the inference very fast with some compromise on the quality. There are also may Lora trained models that can be used if this is implemented.

There is some tensor shape mismatch but I am unable to figure that out. Tha k you for your time

May 06 '24 07:05 allthatido

I think i'm running in to the same type of problem getting a error with all the standard models, (Realvision, unstable) Running on a local machine with a 64gb of ram and a RTX4090 nvidia card 16gb of vram (mobile / laptop) The error that i get is: RuntimeError: The expanded size of the tensor (1024) must match the existing size (3072) at non-singleton dimension 3. Target sizes: [2, 20, 1024, 1024]. Tensor sizes: [3072, 3072]

May 09 '24 18:05 themarshall68

I am running SDXL unquantized just fine on my 8GB 1080 (fooocus), with multiple loras. What else is going on that makes this thing run out of VRAM? I don't think quantized SDXL is the answer. If multiple large models need to be pipelined together (why though?), can't the model loading and processing be handled in a better way? I'm no expert but loading one model -> process all frames -> unload model, load next model in pipeline -> process frames -> etc seems like it would be memory efficient.

May 11 '24 02:05 zombri-eats-brainz

I am running SDXL unquantized just fine on my 8GB 1080 (fooocus), with multiple loras. What else is going on that makes this thing run out of VRAM? I don't think quantized SDXL is the answer. If multiple large models need to be pipelined together (why though?), can't the model loading and processing be handled in a better way? I'm no expert but loading one model -> process all frames -> unload model, load next model in pipeline -> process frames -> etc seems like it would be memory efficient.

Can you please share some more info on your setup and workflow. I have a gtx1080 but getting a cuda out of memory error if I try to run any of the py files or gradio app.

May 11 '24 13:05 allthatido