vggt VGGT Finetuning Issue on VKITTI Datasets

Hi, thank you for open sourcing this awesome work! We are trying to finetune the VGGT (from original checkpoint) on the VKITTI dataset on a single NVIDIA 6000 RTX ADA (48GB) but the output visualization from demo_viser seems to have duplicated and layered point clouds (attached in the screenshots). We are finetuning with a frozen aggregator. We've also attached our default.yaml file below.

Here's our default.yaml code;

defaults:

self
default_dataset.yaml

exp_name: exp010_ft_og_ckpt_corrected img_size: 518 num_workers: 8 seed_value: 42 accum_steps: 2 # We did not use gradient accumulation in our training, while if you suffer from OOM, you can try to use it. patch_size: 14 val_epoch_freq: 1000000000
max_img_per_gpu: 48

limit_train_batches: 800 limit_val_batches: 100

data: train: target: data.dynamic_dataloader.DynamicTorchDataset num_workers: ${num_workers} max_img_per_gpu: ${max_img_per_gpu} common_config: img_size: ${img_size} patch_size: ${patch_size} debug: True repeat_batch: False dataset: target: data.composed_dataset.ComposedDataset dataset_configs: - target: data.datasets.vkitti.VKittiDataset split: train VKitti_DIR: /home/sbangal4/vggt_new/vggt/data/vkitti/vkitti

val: target: data.dynamic_dataloader.DynamicTorchDataset num_workers: ${num_workers} max_img_per_gpu: ${max_img_per_gpu} common_config: img_size: ${img_size} patch_size: ${patch_size} debug: True dataset: target: data.composed_dataset.ComposedDataset dataset_configs: - target: data.datasets.vkitti.VKittiDataset split: train VKitti_DIR: /home/sbangal4/vggt_new/vggt/data/vkitti/vkitti

logging: log_dir: logs log_visuals: False log_freq: 1 log_level_primary: DEBUG log_level_secondary: WARNING all_ranks: False tensorboard_writer: target: train_utils.tb_writer.TensorBoardLogger path: ${logging.log_dir}/tensorboard scalar_keys_to_log: train: keys_to_log: - loss_objective - loss_camera - loss_T - loss_R - loss_FL - loss_conf_depth - loss_reg_depth - loss_grad_depth val: keys_to_log: - loss_objective - loss_camera - loss_T - loss_R - loss_FL - loss_conf_depth - loss_reg_depth - loss_grad_depth

checkpoint: save_dir: logs/${exp_name}/ckpts save_freq: 5 resume_checkpoint_path: /home/sbangal4/vggt_new/vggt/checkpoint/model.pt strict: False

loss: target: loss.MultitaskLoss camera: weight: 5.0 loss_type: "l1" # The paper uses smooth l1 loss, but we found l1 loss is more stable than smooth l1 and l2 loss.
depth: weight: 1.0 gradient_loss_fn: "grad" valid_range: 0.98 #point: null #If you want to enable point, use the following config point: weight: 1.0 gradient_loss_fn: "normal" valid_range: 0.98 track: null

optim: param_group_modifiers: False

optimizer: target: torch.optim.AdamW lr: 1e-6 weight_decay: 0.05

frozen_module_names: - ["aggregator"]

amp: enabled: True amp_dtype: bfloat16 gradient_clip: target: train_utils.gradient_clip.GradientClipper configs: - module_name: [""]
params: [".*"]
max_norm: 1.0 norm_type: 2

options: lr: - scheduler: target: fvcore.common.param_scheduler.CompositeParamScheduler schedulers: - target: fvcore.common.param_scheduler.LinearParamScheduler start_value: 1e-8 end_value: 5e-5 - target: fvcore.common.param_scheduler.CosineParamScheduler start_value: 5e-5 end_value: 1e-8 lengths: [0.05, 0.95] interval_scaling: ['rescaled', 'rescaled'] weight_decay: - scheduler: target: fvcore.common.param_scheduler.ConstantParamScheduler value: 0.05

max_epochs: 20

model: target: vggt.models.vggt.VGGT enable_camera: True enable_depth: True enable_point: True enable_track: True

distributed: backend: nccl comms_dtype: None find_unused_parameters: True timeout_mins: 30 gradient_as_bucket_view: True # Less memory used bucket_cap_mb: 25 broadcast_buffers: True

cuda: cudnn_deterministic: False cudnn_benchmark: False allow_tf32: True

Sep 07 '25 20:09 SaiPrasanthBL

Hello, have you found solutions? I'm also curious how is your training conducted with

limit_train_batches: 800
limit_val_batches: 100

This setting is only for testing which means not enough gradient steps are run.

Nov 03 '25 22:11 youwyu

Hello, have you found solutions? I'm also curious how is your training conducted with
limit_train_batches: 800
limit_val_batches: 100
This setting is only for testing which means not enough gradient steps are run.

Hello, I also noticed this issue, but after setting limit_train_batches to null, the training will proceed according to len_train, right? The default value of 100000 is really too large.

Nov 04 '25 09:11 Shexiaox