Add zero123 challenges, originally in stable-dreamfusion. Also improve the zero123 configuration to reconstruct some of the challenges.

NOTE: all experiments and results were run with zero123XL, a yet-to-be-released model from the Objaverse and Zero123 primary authors. (Thanks Ruoshi and Matt for sharing them with us!) The current official zero123 model will likely not produce as good results and/or take more steps to converge.

Increase camera distance to reduce risk of zero123 cropped predictions, floaters
Reduced rate of white background from 50% to 20% a. IIRC this sped up convergence and improved zero123 predictions, since zero123 is almost only trained on white background b. I tried 10%, and that produced better results for most scenes, except it caused the baby phoenix to temporarily get lots of big white floaters
reduced NeRF render resolution from 128x128 to 64x64. Freed up VRAM that I used to increase the batch size to 12 (to fit in 40GB VRAM) and increase the learning rate (5X), for faster/more reliable convergence
Use hashgrid (not progressive). Converges more reliably.
Use accumulate (not alternate) by default. Haven't compared them lately, but accumulate converged more reliably in previous experiments. And the loss was more regular and easier to compare.
Rebalance all the lambdas for faster, more reliable convergence. Now, the lamba_sds dominates the rest. (example for teddy:)
Decrease max_steps from 10000 to 1999. Honestly, 1000 would be enough for all the examples I looked at.

Scripts to run zero123 comparisons and log results to wandb.

Results (all using zero123XL unless noted otherwise):

The zero123XL guidance seems excellent, even very early (around step 100):

The raw experiments are in https://stability.wandb.io/threestudio/claforte-noise_atten

For comparison, here were the earlier results a few months ago, using Stable-DreamFusion at the time: https://github.com/ashawkey/stable-dreamfusion/issues/294

Anya (zero123XL):

https://github.com/threestudio-project/threestudio/assets/5090329/0cf5b5de-2a72-45bd-b5f7-1e58726570ec

Anya (default zero123 - 105000.ckpt)

https://github.com/threestudio-project/threestudio/assets/5090329/e6eaa2e4-c199-4937-b549-24dc9d73d82a

Beach house 1:

https://github.com/threestudio-project/threestudio/assets/5090329/2f9f35b9-a83c-4627-91cf-dc86f45745d4

Beach house 2:

https://github.com/threestudio-project/threestudio/assets/5090329/58b3a637-db6c-4392-b7ea-7bfc0ee54942

Bollywood actress:

https://github.com/threestudio-project/threestudio/assets/5090329/6012ef02-ca36-4625-bc06-f3da07b2e8e8

Hamburger:

https://github.com/threestudio-project/threestudio/assets/5090329/38ba3d3e-76ba-46da-a020-c2400d077852

Cactus:

https://github.com/threestudio-project/threestudio/assets/5090329/1f000c85-9565-4fed-bf3e-e9d549fdd3e0

Cat statue:

https://github.com/threestudio-project/threestudio/assets/5090329/315f7ca5-2ab7-4a94-956f-40cb01ffdfce

https://github.com/threestudio-project/threestudio/assets/5090329/c20f5830-fd40-45f9-a998-514fa868cfc9

https://github.com/threestudio-project/threestudio/assets/5090329/5981ef5d-09d8-4321-8fca-2ea16111bf88

https://github.com/threestudio-project/threestudio/assets/5090329/4ed68fa1-4c2f-4bc0-a9e7-85e9aa069758

https://github.com/threestudio-project/threestudio/assets/5090329/bf36d78a-3cae-488a-84da-38bb7d804b93

https://github.com/threestudio-project/threestudio/assets/5090329/04e0481c-763c-4880-8a56-67cbc514fb19

Postponed ideas:

system.geometry.mlp_network_config.n_hidden_layers highly sped up convergence (2X?) a. didn't keep because I didn't want to complicate later phases, that usually rely on 1 hidden layer
the quality of the models plateau around 1000 steps. I kept 2000 steps budget in case harder models take longer.

Remaining problems:

The NeRF doesn't seem to learn enough about the Zero123 guidance. In the beach house examples, it tends to approximate the guidance with a very limited palette of colors (e.g. 32 colors) that seem stippled on the models, instead of learning nice gradients. a. I tried a bunch of experiments that didn't help, e.g. zero-ing out/increasing/decreasing the weight decay, zeroing out loss terms, etc. I think there's a problem with the way we train the NeRF. b. I wanted to try to use a regular grid (instead of hash grid) in case that helps, but I ran into hydra configuration issues... so left for later.

Jun 21 '23 18:06 claforte

LGTM! Minor comments only

Jun 23 '23 03:06 voletiv

Looks good! The normal maps are noisy though. Have you considered using similar techniques as in textmesh-if.yaml? i.e., finite difference normal + progressively changing eps as proposed in Neuralangelo.

Jun 23 '23 04:06 bennyguo

If you want to use dense grids, you can use the following config:

otype: Grid
type: Dense
n_levels: xx
n_features_per_level: xx
base_resolution: xx
per_level_scale: xx

see the tinycudann documentation for details.

Jun 23 '23 04:06 bennyguo

I didn't realize that a git rebase main followed by git push --force automatically dismisses a review. Sorry, next time I'll git merge main instead...

Jun 23 '23 19:06 claforte

(Nothing has really changed, except I deleted one obsolete script)

Jun 23 '23 19:06 claforte

I didn't realize that a git rebase main followed by git push --force automatically dismisses a review. Sorry, next time I'll git merge main instead...

I used to do this very often (rebase + force push), until one day I accidentally overrode some commits and it took me great effort to find them back... Conclusion: push --force is so dangerous and we should definitely avoid using it :(

Jun 23 '23 19:06 bennyguo

BTW thanks @bennyguo for these ideas... I'll try them early next week. Also thanks for approving the review in the middle of your night!

Jun 23 '23 19:06 claforte

Just about to sleep. Have a good weekend!

Jun 23 '23 19:06 bennyguo

threestudio
threestudio copied to clipboard

more zero123 challenges + new zero123 defaults

Results (all using zero123XL unless noted otherwise):

threestudio threestudio copied to clipboard

more zero123 challenges + new zero123 defaults

Results (all using zero123XL unless noted otherwise):

threestudio
threestudio copied to clipboard