Loss Instability and Grid Artifacts in MoGe Reproduction
Hi, I'm reproducing MoGe with your training code (unmodified). Two issues:
MoGe Ori
Our Reprodution
With normal loss:
Loss spikes suddenly during training (around 20k step, and visualizations (depth/normals) collapse. Any advice on stabilizing it, possibly related to GT normal computation or outlier handling?
Without normal loss:
Training stabilizes, but normals still show grid-like, unsmooth patterns (see attached: figures). Why might this happen, and why do metrics improve despite worse visuals?
Details:
Environment: bs=8, 8 H20 GPUs. Dataset: Processed by us, datasets combination similar to your setup. Here is an visualization example of one batch.
What’s the role of normal loss here, and how should I adjust dataset processing to match your results? Thanks!
Hello, I met same problem, the results from our train-from-scratch model predicts worse normal result than MoGe-1 vitl version with some grids.
Smaller one(front) from MoGe-1, and larger one is ours
Hi. Sorry for the late response. We haven't encountered divergence in training, but we do have a fix of normal loss in MoGe-2 to improve theoretical stability. MoGe used the normal loss like $\angle (\vec n_\text{pred}, \vec n_\text{gt})$, where the normals are computed by cross production between edges to its neighboring points. In MoGe-2, it is simplified to $\angle (\vec e_\text{pred}, \vec e_\text{gt})$ where e is the edge to its neighboring pixels. We will update the training scripts and losses soon including these modifications.
HI. Any updates? I've met similar problems with normal loss. Got poor surface normals. @EasternJournalist