Ruicheng Wang comments

Results 49 comments of


                                            Ruicheng Wang

About Potential for Speed Improvement

Hi! Thanks for your interest. Here are my suggestions for accelerating the inference: 1. Batchify the input image streams. Single-image inference may not fully utilize the GPU computation resources. Batch...

About Potential for Speed Improvement

Hi. We've recently tested its performance with fp16 precision and found that the inference achieves **2x speed up on GPU without any loss of evaluation scores and visual distortion**, though...

Loss Instability and Grid Artifacts in MoGe Reproduction

Hi. Sorry for the late response. We haven't encountered divergence in training, but we do have a fix of normal loss in MoGe-2 to improve theoretical stability. MoGe used the...

为什么我自己本地跑的结果和你们网站HF demo跑出来的不一致呢

网站是MoGe-1的结果。本地现在默认跑的MoGe-2由于训练数据的不同，mask prediction的偏好会和MoGe-1不一样。这个pattern是mask估计错误导致的（无法判断背景应该作为实体墙保留，还是应该作为非实体的背景去掉）。如果发现mask挂掉，可以将apply_mask置为False，只取原始depth。

为什么我自己本地跑的结果和你们网站HF demo跑出来的不一致呢

精度不会变差的，apply_mask的区别只有是否把天空或纯色背景的深度替换为inf，不影响深度预测的结果。不过对于确实存在天空的图片，如果不apply_mask，天空区域会保留无意义的深度，看起来会比较奇怪。

Normalized camera intrinsics

Hi! The normalized camera intrinsics are defined such that the top-left corner of the image corresponds to (0, 0) and the bottom-right corner corresponds to (1, 1), whereas pixel-space intrinsics...

Normalized camera intrinsics

> Thank you for the reply! Are (H,W) the size of neural network inputs or the original image size? For the current demo of camera intrinsic prediction, are the output...

real life metric measurements with Ruicheng/moge-2-vitl-normal

Hi. It looks like you're interpreting the predicted intrinsics as if they were in pixel space, which leads to an incorrect FOV computation—nearly 180 degrees. You're currently using: ``` fov_x_rad...

real life metric measurements with Ruicheng/moge-2-vitl-normal

I'm sorry, but the model is not expected to predict an accurate scale for this image. Estimating metric scale requires the model to recognize common objects with known size as...

Question about resamplers in V2

We use `conv_transpose` for the first three upsamplers and `bilinear` for the last one, as specified in the model configuration from the pretrained checkpoint: ``` "resamplers": ["conv_transpose", "conv_transpose", "conv_transpose", "bilinear"]...