threestudio
threestudio copied to clipboard
more zero123 challenges + new zero123 defaults
Add zero123 challenges, originally in stable-dreamfusion. Also improve the zero123 configuration to reconstruct some of the challenges.
NOTE: all experiments and results were run with zero123XL, a yet-to-be-released model from the Objaverse and Zero123 primary authors. (Thanks Ruoshi and Matt for sharing them with us!) The current official zero123 model will likely not produce as good results and/or take more steps to converge.
- Increase camera distance to reduce risk of zero123 cropped predictions, floaters
- Reduced rate of white background from 50% to 20%
a. IIRC this sped up convergence and improved zero123 predictions, since zero123 is almost only trained on white background
b. I tried 10%, and that produced better results for most scenes, except it caused the
baby phoenix
to temporarily get lots of big white floaters - reduced NeRF render resolution from 128x128 to 64x64. Freed up VRAM that I used to increase the batch size to 12 (to fit in 40GB VRAM) and increase the learning rate (5X), for faster/more reliable convergence
- Use hashgrid (not progressive). Converges more reliably.
- Use accumulate (not alternate) by default. Haven't compared them lately, but accumulate converged more reliably in previous experiments. And the loss was more regular and easier to compare.
- Rebalance all the lambdas for faster, more reliable convergence. Now, the lamba_sds dominates the rest. (example for
teddy
:) - Decrease
max_steps
from 10000 to 1999. Honestly, 1000 would be enough for all the examples I looked at.
- Scripts to run zero123 comparisons and log results to wandb.
Results (all using zero123XL unless noted otherwise):
The zero123XL guidance seems excellent, even very early (around step 100):
The raw experiments are in https://stability.wandb.io/threestudio/claforte-noise_atten
For comparison, here were the earlier results a few months ago, using Stable-DreamFusion at the time: https://github.com/ashawkey/stable-dreamfusion/issues/294
Anya (zero123XL):
https://github.com/threestudio-project/threestudio/assets/5090329/0cf5b5de-2a72-45bd-b5f7-1e58726570ec
Anya (default zero123 - 105000.ckpt)
https://github.com/threestudio-project/threestudio/assets/5090329/e6eaa2e4-c199-4937-b549-24dc9d73d82a
Beach house 1:
https://github.com/threestudio-project/threestudio/assets/5090329/2f9f35b9-a83c-4627-91cf-dc86f45745d4
Beach house 2:
https://github.com/threestudio-project/threestudio/assets/5090329/58b3a637-db6c-4392-b7ea-7bfc0ee54942
Bollywood actress:
https://github.com/threestudio-project/threestudio/assets/5090329/6012ef02-ca36-4625-bc06-f3da07b2e8e8
Hamburger:
https://github.com/threestudio-project/threestudio/assets/5090329/38ba3d3e-76ba-46da-a020-c2400d077852
Cactus:
https://github.com/threestudio-project/threestudio/assets/5090329/1f000c85-9565-4fed-bf3e-e9d549fdd3e0
Cat statue:
https://github.com/threestudio-project/threestudio/assets/5090329/315f7ca5-2ab7-4a94-956f-40cb01ffdfce
https://github.com/threestudio-project/threestudio/assets/5090329/c20f5830-fd40-45f9-a998-514fa868cfc9
https://github.com/threestudio-project/threestudio/assets/5090329/5981ef5d-09d8-4321-8fca-2ea16111bf88
https://github.com/threestudio-project/threestudio/assets/5090329/4ed68fa1-4c2f-4bc0-a9e7-85e9aa069758
https://github.com/threestudio-project/threestudio/assets/5090329/bf36d78a-3cae-488a-84da-38bb7d804b93
https://github.com/threestudio-project/threestudio/assets/5090329/04e0481c-763c-4880-8a56-67cbc514fb19
Postponed ideas:
-
system.geometry.mlp_network_config.n_hidden_layers
highly sped up convergence (2X?) a. didn't keep because I didn't want to complicate later phases, that usually rely on 1 hidden layer - the quality of the models plateau around 1000 steps. I kept 2000 steps budget in case harder models take longer.
Remaining problems:
- The NeRF doesn't seem to learn enough about the Zero123 guidance. In the beach house examples, it tends to approximate the guidance with a very limited palette of colors (e.g. 32 colors) that seem stippled on the models, instead of learning nice gradients. a. I tried a bunch of experiments that didn't help, e.g. zero-ing out/increasing/decreasing the weight decay, zeroing out loss terms, etc. I think there's a problem with the way we train the NeRF. b. I wanted to try to use a regular grid (instead of hash grid) in case that helps, but I ran into hydra configuration issues... so left for later.
LGTM! Minor comments only
Looks good! The normal maps are noisy though. Have you considered using similar techniques as in textmesh-if.yaml
? i.e., finite difference normal + progressively changing eps as proposed in Neuralangelo.
If you want to use dense grids, you can use the following config:
otype: Grid
type: Dense
n_levels: xx
n_features_per_level: xx
base_resolution: xx
per_level_scale: xx
see the tinycudann documentation for details.
I didn't realize that a git rebase main
followed by git push --force
automatically dismisses a review. Sorry, next time I'll git merge main
instead...
(Nothing has really changed, except I deleted one obsolete script)
I didn't realize that a
git rebase main
followed bygit push --force
automatically dismisses a review. Sorry, next time I'llgit merge main
instead...
I used to do this very often (rebase + force push), until one day I accidentally overrode some commits and it took me great effort to find them back... Conclusion: push --force
is so dangerous and we should definitely avoid using it :(
BTW thanks @bennyguo for these ideas... I'll try them early next week. Also thanks for approving the review in the middle of your night!
Just about to sleep. Have a good weekend!