CFGpp
CFGpp copied to clipboard
I get nonsensical results
I just tried your example of editing on the README, literally
python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_inversion_cfg++" --cfg_guidance 0.6
but I get nonsensical results. Even after using a larger amount of steps (--NFE 50) this is what I get as output
So I tried switching to SDXL with
python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --NFE 50 --model sdxl
but I get IndexError: list index out of range in the textual embedding part, more precisely
Traceback (most recent call last):
File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 72, in <module>
main()
File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/examples/inversion.py", line 53, in main
result = solver.sample(prompt1=[args.null_prompt, args.prompt],
File "/home/wizard/mambaforge/envs/cfgpp/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/wizard/dev/bendai-lib-python/research/face_swapping/external/CFGpp/latent_sdxl.py", line 391, in sample
pool_tgt_prompt_embed) = self.get_text_embed(prompt1[0], prompt1[2], prompt2[0], prompt2[2], clip_skip)
IndexError: list index out of range
Am I doing something wrong?
This was just an error in the examples/inversion.py script, the prompt has to be passed twice (I guess for the two text encoders of SDXL). Yet, this is what I get, at 50 NFE, with
python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_edit_cfg++" --cfg_guidance 0.6 --model sdxl --NFE 50
It seems pretty far from the intended result :-/
In other words this paper is a scam? How is that possible? 🙀
I don't think so. But something is undocumented/unclear, and it would help if the author chimed in to give an example how to use the scripts to reproduce results similar to the paper
@andreaferretti
Thank you for your valuable comment, and apologies for the delayed response.
Firstly, in our paper, we utilized NFE=10 for both the inversion and editing tasks.
We also observed unintended outputs with NFE=50. However, it is important to note that while the DDIM inversion with the original CFG fails even at NFE=10, CFG++ does not encounter this issue.
Lastly, we want to emphasize that the primary objective of our work is to address the issues identified with the original CFG rather than to propose the optimal algorithm for each specific task.
@andreaferretti Are you saying that CFG++ works only at NFE=10? But it seems to give nonsensical results even at NFE=10. Does it require other parameters? Can you publish the exact code examples to reproduce the results in the paper?
@Emasoft
We kindly ask if you have made any changes to our public code or set the correct Diffusers version as specified in the environment.yaml file.
When we clone our public code and run the following code, we got the following result:
python -m examples.inversion --prompt "a photography of baby fox" --method "ddim_inversion_cfg++" --cfg_guidance 0.6 --NFE 10
Additionally, we have just uploaded three image-text pairs from the COCO dataset, which are displayed in Figure 9. You can obtain the same results with NFE=10 and scale=0.2, as indicated in the figure.
Finally, we would like to clarify two points:
-
For generation, cfg++ works just fine with higher NFEs. If you use higher NFEs, you may control the guidance scale to achieve the best performance. Also refer to other community discussions: https://www.reddit.com/r/comfyui/comments/1dqpcsh/new_samplers/ https://openart.ai/workflows/dugumatai/new-sampler-euler_cfg/oGP4a011iYE2UpeTtXNH
-
For inversion, the experiments all resort to 10 NFE, and we observe instabilities with higher NFEs due to inevitable accumulation in the error. Please note that while we achieve smaller error accumulation due to smaller guidance scales, we cannot avoid this for good. Our claim in the paper is that cfg++ still behaves much favorably than cfg
Okay, my bad. A clean install fixed it. When things didn't work with the original command, I made some changes to the code to understand where the problem was and maybe messed up something. But after reinstalling everything and using the above command with the correct NFE set to 10, now the inversion works as expected. You should make all those parameter limitations more clear in the Readme to avoid similar misunderstandings in the future. I'm looking forward to seeing further improvements in CFG++ in your future research.
@CFGpp-diffusion thank you for the clarifications. I agree that with NFE=10, CFG++ is more robust for inversion than normal CFG. With a higher number of steps, other inversion methods are more robust - and usually these can be combined with CFG++ easily, since CFG++ changes nothing in the inversion process, just the actual formula for CFG
Hi @CFGpp-diffusion
Could you explain how I can reproduce the results from table 1? I am having trouble finding information about the dataset (Tab 1). I think it will be most valuable to get the code to give us approximate results for all tables.
@MalarzDawid Thank you for the interest. We have leveraged MS-COCO 10K for quantitative comparisons following other conventional works. Revised paper will contain the missing information about the dataset and some more.