MoGe About Potential for Speed Improvement

Hi, I really appreciate your excellent work!

I am currently exploring the possibility of running the model in a real-time scenario. I have two questions that want to discuss.

Are there any suggestions to further accelerate the inference speed?
Do you have any plans to train and release a smaller version of the model, such as using ViT-S or other lightweight backbones, which might help improve real-time performance?

Thank you for your time and dedication.

Dec 17 '24 03:12 curiosity654

Hi! Thanks for your interest. Here are my suggestions for accelerating the inference:

Batchify the input image streams. Single-image inference may not fully utilize the GPU computation resources. Batch inference will be more efficient on GPUs.
Trade off performance for speed with smaller input image sizes. You can configure this using the resolution_level argument in the model.infer() function. For example, setting resolution_level=0 (the lowest) will significantly improve speed while causing an acceptable drop in performance. The overall accuracy should remain intact, though some details may be lost.

Regarding smaller models, we plan to release the ViT-Base model in the future. We're making a few updates recently. Thanks for your patience!

Feb 22 '25 14:02 EasternJournalist

Hi. We've recently tested its performance with fp16 precision and found that the inference achieves 2x speed up on GPU without any loss of evaluation scores and visual distortion, though the surface smoothness may be inferior due to the limited precision. The code has been updated and now supports use_fp16 to enable native autocast.

Mar 18 '25 18:03 EasternJournalist

May I ask if the lightweight model (like using vit-s backbone) is train from scratch or distilled?

Jul 17 '25 08:07 Candy-Crusher