Jianyuan Wang comments

Results 238 comments of


                                            Jianyuan Wang

input image shape height > width

Hi @yaseryacoob , As discussed above, the recommendation of a 1x1 aspect ratio isn't necessarily a general rule and might be optimal only in specific scenarios. From my personal experiments,...

input image shape height > width

I may see the problem. During training, all images are trained with a width of 518. So the model will predict the filed of view based on the ratio between...

input image shape height > width

Hi @yaseryacoob , Thanks for the detailed discussion—the example looks great! If you’re aiming for multi-view consistency while preserving high resolution, one possible solution is to use our predicted depth...

input image shape height > width

HI @ChenYutongTHU , Are you using the undistorted or original version of ETH3D? The original version of ETH3D was captured by non-pinhole camera and has noticeable distortion. You need to...

Clarification on Camera Coordinate System and Translation Scale in VGGT

Hi yes the depth prediction is relative depth. The cameras are cam_from_world, so world_to_cam. For aligning to the ground-truth scale, the github issue below contains the code we use for...

Ran out of memory

Hi, Please check [here](https://github.com/facebookresearch/vggt/blob/c4b5da2d8592a33d52fb6c93af333ddf35b5bcb9/demo_gradio.py#L212), you can simply save the prediction dictionary as: with torch.no_grad(): predictions = run_model(target_dir, model) # Save predictions prediction_save_path = os.path.join(target_dir, "predictions.npz") np.savez(prediction_save_path, **predictions) # Handle None...

Jianyuan Wang

input image shape height > width

input image shape height > width

input image shape height > width

input image shape height > width

Clarification on Camera Coordinate System and Translation Scale in VGGT

Ran out of memory

Tracking head fine tune

Show Point Cloud in Model3D

activations and training stability

Evaluating the pose on ScanNet