IF icon indicating copy to clipboard operation
IF copied to clipboard

Close, but no banana.

Open phalexo opened this issue 1 year ago • 13 comments

The prompt "HD realistic photo of a baby chimp with a boy."

In general I am not able to get any good text in images either.

Chimps1

phalexo avatar May 06 '23 13:05 phalexo

Bro. These are raw models. Finetuning is needed. For raw output. I considered that pretty amazing. I do wonder why they dind't use Flan-T5 instead, these wouldn't be issues, even with the raw model. It would follow instructions to a tee. I think finetuning on Flan-T5 should be added to the research list.

darkman111a avatar May 07 '23 15:05 darkman111a

This: https://github.com/deep-floyd/IF/discussions/89

Also the following

| aspect: | 2:3 | prompt: | 3/4 portrait, resin bust, Mad Joker model, clear irises, 3d, pixar style, ultra perfect composition, liquid, detailing fluid, acrylic,85mm, photoreal | negative prompt: | logo, text, watermark, word, signature, label, sign, meme | style: | in style of Bill Sienkiewicz | seed: | 3345106520 | if_I guidance: | 14.0 (watermarking was turned off for this one)

joker_bill_sienkiewicz_3

| aspect: | 3:2 | prompt: | a black and white medium format 85mm portrait of a kitten wearing a tuxedo on his way to a funeral, the image is high quality and highly detailed with the kitten's features clearly visible, photographer Edward Weston used Agfa Isopan ISO 25 film to create this image, which resembles Edward Weston's photograph Pepper No. 35 | seed: | 404353238 | if_I guidance: | 7.0

kitten_4

admittedly the kitten above is a teensy bit funky, physically, but I spent no time trying to optimize the prompt - it's one I crafted for SD 1.0

There's nothing wrong with the model

tildebyte avatar May 07 '23 16:05 tildebyte

Not quite the crisp image above.

Joker1

phalexo avatar May 07 '23 22:05 phalexo

Joker2

phalexo avatar May 07 '23 22:05 phalexo

This kind of looks ok.

Kitty1

phalexo avatar May 08 '23 01:05 phalexo

@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?

[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.

tildebyte avatar May 08 '23 20:05 tildebyte

@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?

[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.

I have spread the model over 3 GPUs, Titan X with 12.3GiB each. I did have to set the type to float16 for T5, otherwise it causes a runtime cuBLAS error.

T5 is about 11.6GiB if_I is about 9.2GiB if_II + if_III is about 5.8GiB

I am generating a single image, not sure why two busts come out.

torch==2.0..0+cu118

phalexo avatar May 08 '23 20:05 phalexo

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:1')

if_II = IFStageII('IF-II-L-v1.0', device='cuda:2')

if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')

t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)

phalexo avatar May 08 '23 21:05 phalexo

@phalexo;

not sure why two busts come out

aspect='2:3' is super important

If you give it a wide aspect (3:2) the model will fill the space with something - usually a duplicate

tildebyte avatar May 09 '23 16:05 tildebyte

Much better. The aspect ratio was affecting quality.

Joker4

phalexo avatar May 09 '23 18:05 phalexo

Hugging1

phalexo avatar May 09 '23 21:05 phalexo

WowGirl1

phalexo avatar May 10 '23 22:05 phalexo

WowGirl1

what's its prompt? Thanks.

Cloud-Pku avatar May 18 '23 03:05 Cloud-Pku