CFGpp icon indicating copy to clipboard operation
CFGpp copied to clipboard

I get nonsensical results

Open andreaferretti opened this issue 1 year ago • 10 comments

I just tried your example of editing on the README, literally

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_inversion_cfg++" --cfg_guidance 0.6

but I get nonsensical results. Even after using a larger amount of steps (--NFE 50) this is what I get as output reconstruct

So I tried switching to SDXL with

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --NFE 50 --model sdxl

but I get IndexError: list index out of range in the textual embedding part, more precisely

Traceback (most recent call last):
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 72, in <module>
    main()
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 53, in main
    result = solver.sample(prompt1=[args.null_prompt, args.prompt],
  File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/latent_sdxl.py", line 391, in sample
    pool_tgt_prompt_embed) = self.get_text_embed(prompt1[0], prompt1[2], prompt2[0], prompt2[2], clip_skip)
IndexError: list index out of range

Am I doing something wrong?

andreaferretti avatar Jul 05 '24 15:07 andreaferretti

This was just an error in the examples/inversion.py script, the prompt has to be passed twice (I guess for the two text encoders of SDXL). Yet, this is what I get, at 50 NFE, with

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --model sdxl --NFE 50

reconstruct

It seems pretty far from the intended result :-/

andreaferretti avatar Jul 05 '24 15:07 andreaferretti

In other words this paper is a scam? How is that possible? 🙀

Emasoft avatar Jul 19 '24 08:07 Emasoft

I don't think so. But something is undocumented/unclear, and it would help if the author chimed in to give an example how to use the scripts to reproduce results similar to the paper

andreaferretti avatar Jul 19 '24 09:07 andreaferretti

@andreaferretti

Thank you for your valuable comment, and apologies for the delayed response.

Firstly, in our paper, we utilized NFE=10 for both the inversion and editing tasks.

We also observed unintended outputs with NFE=50. However, it is important to note that while the DDIM inversion with the original CFG fails even at NFE=10, CFG++ does not encounter this issue.

Lastly, we want to emphasize that the primary objective of our work is to address the issues identified with the original CFG rather than to propose the optimal algorithm for each specific task.

CFGpp-diffusion avatar Jul 20 '24 08:07 CFGpp-diffusion

@andreaferretti Are you saying that CFG++ works only at NFE=10? But it seems to give nonsensical results even at NFE=10. Does it require other parameters? Can you publish the exact code examples to reproduce the results in the paper?

Emasoft avatar Jul 20 '24 08:07 Emasoft

@Emasoft

We kindly ask if you have made any changes to our public code or set the correct Diffusers version as specified in the environment.yaml file.

When we clone our public code and run the following code, we got the following result:

python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_inversion_cfg++" --cfg_guidance 0.6 --NFE 10

image

Additionally, we have just uploaded three image-text pairs from the COCO dataset, which are displayed in Figure 9. You can obtain the same results with NFE=10 and scale=0.2, as indicated in the figure.

Finally, we would like to clarify two points:

  1. For generation, cfg++ works just fine with higher NFEs. If you use higher NFEs, you may control the guidance scale to achieve the best performance. Also refer to other community discussions: https://www.reddit.com/r/comfyui/comments/1dqpcsh/new_samplers/ https://openart.ai/workflows/dugumatai/new-sampler-euler_cfg/oGP4a011iYE2UpeTtXNH

  2. For inversion, the experiments all resort to 10 NFE, and we observe instabilities with higher NFEs due to inevitable accumulation in the error. Please note that while we achieve smaller error accumulation due to smaller guidance scales, we cannot avoid this for good. Our claim in the paper is that cfg++ still behaves much favorably than cfg

CFGpp-diffusion avatar Jul 20 '24 10:07 CFGpp-diffusion

Okay, my bad. A clean install fixed it. When things didn't work with the original command, I made some changes to the code to understand where the problem was and maybe messed up something. But after reinstalling everything and using the above command with the correct NFE set to 10, now the inversion works as expected. You should make all those parameter limitations more clear in the Readme to avoid similar misunderstandings in the future. I'm looking forward to seeing further improvements in CFG++ in your future research.

Emasoft avatar Jul 20 '24 12:07 Emasoft

@CFGpp-diffusion thank you for the clarifications. I agree that with NFE=10, CFG++ is more robust for inversion than normal CFG. With a higher number of steps, other inversion methods are more robust - and usually these can be combined with CFG++ easily, since CFG++ changes nothing in the inversion process, just the actual formula for CFG

andreaferretti avatar Jul 22 '24 08:07 andreaferretti

Hi @CFGpp-diffusion

Could you explain how I can reproduce the results from table 1? I am having trouble finding information about the dataset (Tab 1). I think it will be most valuable to get the code to give us approximate results for all tables.

MalarzDawid avatar Aug 01 '24 06:08 MalarzDawid

@MalarzDawid Thank you for the interest. We have leveraged MS-COCO 10K for quantitative comparisons following other conventional works. Revised paper will contain the missing information about the dataset and some more.

geonyeong-park avatar Aug 08 '24 07:08 geonyeong-park