stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

feat: Longcat-Image / Longcat-Image-Edit support

Open stduhpf opened this issue 2 weeks ago • 15 comments

for https://github.com/leejet/stable-diffusion.cpp/issues/1052

sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

output

Test models (converted to bfl format) can be found there:

  • https://huggingface.co/stduhpf/LongCat-Image-gguf/tree/main
  • https://huggingface.co/stduhpf/LongCat-Image-Edit-gguf/tree/main
  • https://huggingface.co/stduhpf/LongCat-Image-Dev-gguf/tree/main

Inference for models in diffusers format seem to be still broken

stduhpf avatar Dec 05 '25 20:12 stduhpf

That does look a bit like a circuit board...

wbruna avatar Dec 05 '25 20:12 wbruna

TODO for when image generation works image

stduhpf avatar Dec 06 '25 02:12 stduhpf

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

stduhpf avatar Dec 06 '25 15:12 stduhpf

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and output I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

stduhpf avatar Dec 07 '25 21:12 stduhpf

I think I got it? output

With the padding fixed, but with diffusers format: output

stduhpf avatar Dec 08 '25 00:12 stduhpf

With the character-level tokenization trick: output

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

stduhpf avatar Dec 08 '25 01:12 stduhpf

Oh no, why are there so many conflicts now?

stduhpf avatar Dec 08 '25 01:12 stduhpf

Using ' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect " instead

stduhpf avatar Dec 08 '25 11:12 stduhpf

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

ref out
flux1-dev-q8_0 output

(Also I made the change so it now needs double quotes around literal text)

stduhpf avatar Dec 08 '25 12:12 stduhpf

output

Somehow couldn't get it to remove the original text, but there it goes

stduhpf avatar Dec 08 '25 13:12 stduhpf

May I ask which comfyui node is used to load this GGUF model?

Rocky-Lee-001 avatar Dec 10 '25 06:12 Rocky-Lee-001

Now supports UTF-8 encoding properly for the quoted text. (also quote characters are no longer excluded from the prompt after being parsed, seems to help a bit, especially with longer text.)

stduhpf avatar Dec 12 '25 01:12 stduhpf

May I ask which comfyui node is used to load this GGUF model?

@Rocky-Lee-001 I don't think LongCat-Image is natively supported by ComfyUI yet. You could give https://github.com/sooxt98/comfyui_longcat_image a try, maybe it works well with the GGUF node for comfyUI?

stduhpf avatar Dec 12 '25 02:12 stduhpf

I’m not sure whether I did something wrong on my end, but I got a strange image.

.\bin\Release\sd-cli.exe --diffusion-model  ..\models\longcat_bfl_format-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ae.sft  --llm ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p 'a lovely cat' --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa
output

leejet avatar Dec 13 '25 07:12 leejet

@leejet that's strange. I can reproduce it with the same prompt though (even with Q8_0 model), but I haven't gotten anything like this in my earlier testing. Maybe There's a linear layer that could use scaling?

Does not seem related to seed.

It's a combination of short prompts + low resolution that seems to cause it.

stduhpf avatar Dec 13 '25 14:12 stduhpf