lightning-hydra-template
lightning-hydra-template copied to clipboard
wandb log contains duplicated logs
I am using this repo with wandb logger. However, upon checking the logs on the wandb website, I've noticed that there are many duplicate lines. Can you assist me in tracking down the cause of this issue?
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 9, global step 3730: 'val/criteria' was not in top 1
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
`Trainer.fit` stopped: `max_epochs=10` reached.
Restoring states from the checkpoint path at /projects/leelab2/chanwkim/dermatology_datasets/logs/train/runs/2023-01-14_21-21-44/checkpoints/epoch_004.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Loaded model weights from checkpoint at /projects/leelab2/chanwkim/dermatology_datasets/logs/train/runs/2023-01-14_21-21-44/checkpoints/epoch_004.ckpt
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
Epoch 0/9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/467 0:00:00 • -:--:-- 0.00it/s loss: nan v_num: uhkz val/loss: 4.883 val/criteria: -4.883 val/criteria_best: -4.883
The cause might be a different training process running on each of your GPUs. I suspect for 4 devices you will have 4 times as much logs.
I'm not knowledgable about how wandb handles logging in multi-gpu setup so I can't really help
The cause might be a different training process running on each of your GPUs. I suspect for 4 devices you will have 4 times as much logs.
I'm not knowledgable about how wandb handles logging in multi-gpu setup so I can't really help
I'm using wandb logger in a single GTX 1080 Ti still have the same question. I'm trying to discover it.