Alberto Zhu

Results 3 issues of Alberto Zhu

[rank0]:[E ProcessGroupNCCL.cpp:1316] [PG 0 Rank 0] Heartbeat monitor timed out! Process will be terminated after dumping debug info. workMetaList_.size()=1 [rank0]:[E ProcessGroupNCCL.cpp:1153] [PG 0 Rank 0] ProcessGroupNCCL preparing to dump debug...

![image](https://github.com/user-attachments/assets/08506387-34bf-44b7-81f8-1baf85ccec3a) Take 5s video segments form hundreds of videos, each 5s video segment takes 10 frames of images to train DINOv2 from the beginning, the input tensor shape of the...

I20240724 09:37:05 4020743 dino_fl helpers.py:102] Training [ 3490/125000] eta: 14:39:00 lr: 0.0000 (0.0000) wd: 0.0407 (0.0402) mom: 0.9920 (0.9920) last_layer_lr: 0.0000 (0.0000) current_batch_size: 2.0000 (2.0000) total_loss: 0.0078 (2.4457) stage_local_mse_loss: 0.0039...