stable-dreamfusion
stable-dreamfusion copied to clipboard
Case study: Anya
This will track progress on reconstructing Anya. https://spy-x-family.fandom.com/wiki/Anya_Forger
Disclaimers:
- this is using an A100 40GB GPU. You can try to reproduce these results using
scripts/run_image_anya.sh
but there's no guarantees you'll reproduce these results, especially if you reduce the batch size.
TODOs
[ ] give her a haircut after Zero123, by generating 6 standard views, allowing users to replace side view with less hair (Vikram) [ ] optimize the runtime and VRAM usage (reduce batch size, number of iters) (claforte) [ ] improve quality with front reference view during SD SDS (claforte) [ ] try DeepFloyd model instead of SD [ ] try a finetuned version SDS, ideally that knows about Anya, e.g. https://civitai.com/models/9462/anya-forger [ ] try to disable textureless mode - it may cause the hallucination of a 3rd arm [ ] debug why some batches are way too close to the subject, causing cropping
2023-05-16 evening:
Best result so far... Result of phase 3 (~4 more hours):
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/1ead192f-4ead-43ce-9bfe-7a68ddee4810
Result of phase 2 (~3 hours on 1 A100?):
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/655168c5-3280-4726-91c8-ca07cef3339e
2023-05-16 improved Anya
Results of improved phase 2:
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/8f430deb-2ef8-4d0a-a1b4-f19090138a9d
Another attempt with different hyperparams:
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/479d8e4f-4446-42d7-b8b3-d930047ee279
(will provide details shortly)
Downsides:
- She doesn't look quite like Anya. SDS and rgbd loss are constantly bickering over the two brown accessories on her head, and the shape of hair.
2023-05-15 Haircut:
Took the side view of anya and gave her a haircut:
| | -> |
|
Then continued to train using two images: front image, and side image with haircut:
Haircut is preserved well, and is obtained after just ~20 epochs.
Tried to add a third image, but didn't work too much better. Tried training from scratch, but results were better when continuing to train from last checkpoint.
Tried tweaking other parameters:
-
textureless_ratio=0
did not help, so in fact textureless_ratio > 0 helps: -
Progressive level is a sound idea, but perhaps it is not initialized to imitate the identity function when increasing levels, so results turn bad when level is changed, and doesn't correct later:
2023-05-12 1st attempt:
Phased results
Phase 1 - Zero123 from 1 front reference image
The rough shape and colors of Anya are reconstructed, but:
- her face is concave
- her hair is excessively long
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/36701129-18f6-4451-89bf-93a1bfb9c571
Phase 2 - text prompt guidance using SD (2.1?)
Noticeable problems:
- Anya no longer looks the same (the rgbd guidance isn't applied yet in this phase?)
- The excess hair in the back transformed into an extra arm. Sigh.
- The ears are also deformed and asymetrical.
- her eyes are holes 
https://github.com/ashawkey/stable-dreamfusion/assets/5090329/2ba1daab-d2b4-4a9b-8193-780fde13125e
Most noticeable problems:
- Her arms are deformed
- Her eyes are dark pits of despair
Specific improvements/ideas
reduce the jitter
The default jitter seems too high to me. The default is equivalent to --jitter_pose --jitter_center 0.2 --jitter_target 0.2 --jitter_up 0.02
.
I noticed that when parts of the object are often cropped (e.g. Anya's feet), they diverge rapidly into incorrect configuration, e.g. the feet reverse so they point towards her back, and that causes other Janus problems to emerge.
For this reconstruction, I used instead --jitter_pose --jitter_center 0.015 --jitter_target 0.015 --jitter_up 0.01
This is awesome!
- I have some results which shows better hair:
python main.py -O --image data/anya_front_rgba.png --workspace trial_image_anya --iters 5000
https://github.com/ashawkey/stable-dreamfusion/assets/25863658/ea73c4b5-6e70-40df-89e1-d32d7b9cb5a8
after dmtet:
python main.py -O --image data/anya_front_rgba.png --workspace trial2_image_anya --iters 5000 --dmtet --init_with trial_image_anya/checkpoints/df.pth
https://github.com/ashawkey/stable-dreamfusion/assets/25863658/f64967c2-42f7-4370-b8aa-22adb53f9216
I think jitter should be disabled as zero123 only support camera rotation and scaling, while jitter introduces translation?
- Despite the identity, the SD results are quite amazing! I think the SD guidance is so strong that it finally overweighs the RGB guidance, even though we do RGB guidance every 1 iter. I think the major problem is that stable-diffusion simply cannot generate the specific Anya Forger, maybe we have to apply some finetuned dreambooth model (but that's way too complicated to zero123).
I think jitter should be disabled as zero123 only support camera rotation and scaling
Let me clarify, I don't use jitter during phase 1 (zero123), I only use it in later phases (using SD2.x?)
Despite the identity, the SD results are quite amazing! I think the SD guidance is so strong that it finally overweighs the RGB guidance, even though we do RGB guidance every 1 iter.
I agree that the SD results are promising. I wasn't even providing the reference image in phases 2-3, but more fixes are required to handle the 2D conditioning with batch_size>1, etc. I'm working on it. :-)
@ashawkey: I tweaked hyperparams a lot and ended up with much better results... see the updated description on top. :-) I'll share the script tomorrow morning.
Zero123 first author here. I'm thrilled to see the great efforts by both of you guys to improve zero123 results on SDS guidance @claforte @ashawkey ! If at any point I can be of any help or provide clarification of the paper and code, please don't hesitate to reach out!
@ruoshiliu Thanks for opensourcing the great work! We will :-)