RuntimeError: batch size must be positive when accum_steps > batch size in dynamic batching
Hi jianyuan, thanks for this amazing work!
When using gradient accumulation with dynamic batching, the training crashes with RuntimeError: batch size must be positive if the dynamically computed batch size is smaller than accum_steps. I think there might be a bug at https://github.com/facebookresearch/vggt/blob/main/training/trainer.py#L823 The issue occurs in the interaction between dynamic batch sampling and gradient accumulation: The DynamicBatchSampler computes batch size as: batch_size = max_img_per_gpu / random_image_num When random_image_num is large (e.g., 24), this results in a small batch size (e.g., 48/24 = 2) The chunk_batch_for_accum_steps function then tries to split this batch into accum_steps chunks When accum_steps=4 and batch_size=2, the chunking logic 2 // 4 = 0 produces empty batches
Reproduction Steps:
In default.yaml:
accum_steps: 4
max_img_per_gpu: 48
Complete Traceback:
ERROR 2025-08-20 20:20:46,326 trainer.py: 421: Training failed with error: batch size must be positive
[rank1]: Traceback (most recent call last):
[rank1]: File "/path/to/training/launch.py", line 31, in
Would you please confirm if there is a problem here? It should be easily fixable by setting a minimum batch size during chunk splitting. Thanks again for your contribution to the community!
Yes, this is expected. If you want to accumulate N steps, the batch size B should be a multiple of N.
This helps me a lot! By the way, you can also set the accum_steps=1 in default.yaml.