MoGe Loss Instability and Grid Artifacts in MoGe Reproduction

Hi, I'm reproducing MoGe with your training code (unmodified). Two issues:

MoGe Ori

Our Reprodution

With normal loss:

Loss spikes suddenly during training (around 20k step, and visualizations (depth/normals) collapse. Any advice on stabilizing it, possibly related to GT normal computation or outlier handling?

Without normal loss:

Training stabilizes, but normals still show grid-like, unsmooth patterns (see attached: figures). Why might this happen, and why do metrics improve despite worse visuals?

Details:

Environment: bs=8, 8 H20 GPUs. Dataset: Processed by us, datasets combination similar to your setup. Here is an visualization example of one batch.

What’s the role of normal loss here, and how should I adjust dataset processing to match your results? Thanks!

Sep 02 '25 09:09 Candy-Crusher

Hello, I met same problem, the results from our train-from-scratch model predicts worse normal result than MoGe-1 vitl version with some grids.

Smaller one(front) from MoGe-1, and larger one is ours

Sep 02 '25 09:09 mx-liu6

Hi. Sorry for the late response. We haven't encountered divergence in training, but we do have a fix of normal loss in MoGe-2 to improve theoretical stability. MoGe used the normal loss like $\angle (\vec n_\text{pred}, \vec n_\text{gt})$, where the normals are computed by cross production between edges to its neighboring points. In MoGe-2, it is simplified to $\angle (\vec e_\text{pred}, \vec e_\text{gt})$ where e is the edge to its neighboring pixels. We will update the training scripts and losses soon including these modifications.

Sep 20 '25 08:09 EasternJournalist

HI. Any updates? I've met similar problems with normal loss. Got poor surface normals. @EasternJournalist

Oct 14 '25 03:10 Bearick