stable-dreamfusion icon indicating copy to clipboard operation
stable-dreamfusion copied to clipboard

Case study: Anya

Open claforte opened this issue 1 year ago • 5 comments

This will track progress on reconstructing Anya. https://spy-x-family.fandom.com/wiki/Anya_Forger

Disclaimers:

  • this is using an A100 40GB GPU. You can try to reproduce these results using scripts/run_image_anya.sh but there's no guarantees you'll reproduce these results, especially if you reduce the batch size.

TODOs

[ ] give her a haircut after Zero123, by generating 6 standard views, allowing users to replace side view with less hair (Vikram) [ ] optimize the runtime and VRAM usage (reduce batch size, number of iters) (claforte) [ ] improve quality with front reference view during SD SDS (claforte) [ ] try DeepFloyd model instead of SD [ ] try a finetuned version SDS, ideally that knows about Anya, e.g. https://civitai.com/models/9462/anya-forger [ ] try to disable textureless mode - it may cause the hallucination of a 3rd arm [ ] debug why some batches are way too close to the subject, causing cropping

2023-05-16 evening:

Best result so far... Result of phase 3 (~4 more hours):

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/1ead192f-4ead-43ce-9bfe-7a68ddee4810

Result of phase 2 (~3 hours on 1 A100?):

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/655168c5-3280-4726-91c8-ca07cef3339e

2023-05-16 improved Anya

Results of improved phase 2:

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/8f430deb-2ef8-4d0a-a1b4-f19090138a9d

Another attempt with different hyperparams:

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/479d8e4f-4446-42d7-b8b3-d930047ee279

(will provide details shortly)

Downsides:

  1. She doesn't look quite like Anya. SDS and rgbd loss are constantly bickering over the two brown accessories on her head, and the shape of hair.

2023-05-15 Haircut:

Took the side view of anya and gave her a haircut:

| | -> | |

Then continued to train using two images: front image, and side image with haircut: fromchrbs1

Haircut is preserved well, and is obtained after just ~20 epochs.

Tried to add a third image, but didn't work too much better. Tried training from scratch, but results were better when continuing to train from last checkpoint.

Tried tweaking other parameters:

  • textureless_ratio=0 did not help, so in fact textureless_ratio > 0 helps: Screenshot 2023-05-16 at 23 24 21

  • Progressive level is a sound idea, but perhaps it is not initialized to imitate the identity function when increasing levels, so results turn bad when level is changed, and doesn't correct later: Screenshot 2023-05-16 at 23 25 30

2023-05-12 1st attempt:

Phased results

Phase 1 - Zero123 from 1 front reference image

The rough shape and colors of Anya are reconstructed, but:

  1. her face is concave
  2. her hair is excessively long

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/36701129-18f6-4451-89bf-93a1bfb9c571

Phase 2 - text prompt guidance using SD (2.1?)

Noticeable problems:

  1. Anya no longer looks the same (the rgbd guidance isn't applied yet in this phase?)
  2. The excess hair in the back transformed into an extra arm. Sigh.
  3. The ears are also deformed and asymetrical.
  4. her eyes are holes ![image](https://github.com/ashawkey/stable-dreamfusion/assets/5090329/6032f034-f4e9-4a88-8587-ac89a8e99f27

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/2a964da3-9cdc-4f44-91f8-a28635d3c42a

Phase 3 - higher-res text prompt guidance using SD (2.1?)

https://github.com/ashawkey/stable-dreamfusion/assets/5090329/2ba1daab-d2b4-4a9b-8193-780fde13125e

Most noticeable problems:

  1. Her arms are deformed
  2. Her eyes are dark pits of despair

Specific improvements/ideas

reduce the jitter

The default jitter seems too high to me. The default is equivalent to --jitter_pose --jitter_center 0.2 --jitter_target 0.2 --jitter_up 0.02. step_0010010_excessive_default_jitter I noticed that when parts of the object are often cropped (e.g. Anya's feet), they diverge rapidly into incorrect configuration, e.g. the feet reverse so they point towards her back, and that causes other Janus problems to emerge.

For this reconstruction, I used instead --jitter_pose --jitter_center 0.015 --jitter_target 0.015 --jitter_up 0.01 step_0010010_smaller_jitter

claforte avatar May 12 '23 17:05 claforte

This is awesome!

  1. I have some results which shows better hair:
python main.py -O --image data/anya_front_rgba.png --workspace trial_image_anya --iters 5000

https://github.com/ashawkey/stable-dreamfusion/assets/25863658/ea73c4b5-6e70-40df-89e1-d32d7b9cb5a8

after dmtet:

python main.py -O --image data/anya_front_rgba.png --workspace trial2_image_anya --iters 5000 --dmtet --init_with trial_image_anya/checkpoints/df.pth

https://github.com/ashawkey/stable-dreamfusion/assets/25863658/f64967c2-42f7-4370-b8aa-22adb53f9216

I think jitter should be disabled as zero123 only support camera rotation and scaling, while jitter introduces translation?

  1. Despite the identity, the SD results are quite amazing! I think the SD guidance is so strong that it finally overweighs the RGB guidance, even though we do RGB guidance every 1 iter. I think the major problem is that stable-diffusion simply cannot generate the specific Anya Forger, maybe we have to apply some finetuned dreambooth model (but that's way too complicated to zero123).

ashawkey avatar May 13 '23 06:05 ashawkey

I think jitter should be disabled as zero123 only support camera rotation and scaling

Let me clarify, I don't use jitter during phase 1 (zero123), I only use it in later phases (using SD2.x?)

Despite the identity, the SD results are quite amazing! I think the SD guidance is so strong that it finally overweighs the RGB guidance, even though we do RGB guidance every 1 iter.

I agree that the SD results are promising. I wasn't even providing the reference image in phases 2-3, but more fixes are required to handle the 2D conditioning with batch_size>1, etc. I'm working on it. :-)

claforte avatar May 15 '23 13:05 claforte

@ashawkey: I tweaked hyperparams a lot and ended up with much better results... see the updated description on top. :-) I'll share the script tomorrow morning.

claforte avatar May 17 '23 03:05 claforte

Zero123 first author here. I'm thrilled to see the great efforts by both of you guys to improve zero123 results on SDS guidance @claforte @ashawkey ! If at any point I can be of any help or provide clarification of the paper and code, please don't hesitate to reach out!

ruoshiliu avatar May 17 '23 07:05 ruoshiliu

@ruoshiliu Thanks for opensourcing the great work! We will :-)

ashawkey avatar May 17 '23 12:05 ashawkey