Jintao Lin

Results 11 issues of Jintao Lin

If we use the VILADistributedSampler (https://github.com/Efficient-Large-Model/VILA/blob/main/llava/train/llava_trainer.py#L274-L281) for Distributed Training, should the `gradient_accumulation_steps` be hardcoded to 1? Since I notice that when I use 4 nodes (8 GPUs per node) to...