SimMIM
SimMIM copied to clipboard
Loss goes nan after 14 epochs
Hi, thank you for releasing such a wonderful work. I tried to replicate the results using the following command:
python -m torch.distributed.launch --nproc_per_node 8 main_simmim.py --cfg configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml --data-path /mnt/fsx/datasets/imagenet/train --accumulation-steps 2
which gave me nan
loss after 14 epochs:
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 22): INFO >>>>>>>>>> Build Optimizer for Pre-training Stage
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 27): INFO No weight decay: {'encoder.mask_token', 'encoder.absolute_pos_embed'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 30): INFO No weight decay keywords: {'encoder.relative_position_bias_table'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 63): INFO No decay params: ['encoder.mask_token', 'encoder.patch_embed.proj.bias', 'encoder.patch_embed.norm.weight', 'encoder.patch_embed.norm.bias', 'encoder.layers.0.blocks.0.norm1.weight', 'encoder.layers.0.blocks.0.norm1.bias', 'encoder.layers.0.blocks.0.attn.qkv.bias', 'encoder.layers.0.blocks.0.attn.proj.bias', 'encoder.layers.0.blocks.0.norm2.weight', 'encoder.layers.0.blocks.0.norm2.bias', 'encoder.layers.0.blocks.0.mlp.fc1.bias', 'encoder.layers.0.blocks.0.mlp.fc2.bias', 'encoder.layers.0.blocks.1.norm1.weight', 'encoder.layers.0.blocks.1.norm1.bias', 'encoder.layers.0.blocks.1.attn.qkv.bias', 'encoder.layers.0.blocks.1.attn.proj.bias', 'encoder.layers.0.blocks.1.norm2.weight', 'encoder.layers.0.blocks.1.norm2.bias', 'encoder.layers.0.blocks.1.mlp.fc1.bias', 'encoder.layers.0.blocks.1.mlp.fc2.bias', 'encoder.layers.0.downsample.norm.weight', 'encoder.layers.0.downsample.norm.bias', 'encoder.layers.1.blocks.0.norm1.weight', 'encoder.layers.1.blocks.0.norm1.bias', 'encoder.layers.1.blocks.0.attn.qkv.bias', 'encoder.layers.1.blocks.0.attn.proj.bias', 'encoder.layers.1.blocks.0.norm2.weight', 'encoder.layers.1.blocks.0.norm2.bias', 'encoder.layers.1.blocks.0.mlp.fc1.bias', 'encoder.layers.1.blocks.0.mlp.fc2.bias', 'encoder.layers.1.blocks.1.norm1.weight', 'encoder.layers.1.blocks.1.norm1.bias', 'encoder.layers.1.blocks.1.attn.qkv.bias', 'encoder.layers.1.blocks.1.attn.proj.bias', 'encoder.layers.1.blocks.1.norm2.weight', 'encoder.layers.1.blocks.1.norm2.bias', 'encoder.layers.1.blocks.1.mlp.fc1.bias', 'encoder.layers.1.blocks.1.mlp.fc2.bias', 'encoder.layers.1.downsample.norm.weight', 'encoder.layers.1.downsample.norm.bias', 'encoder.layers.2.blocks.0.norm1.weight', 'encoder.layers.2.blocks.0.norm1.bias', 'encoder.layers.2.blocks.0.attn.qkv.bias', 'encoder.layers.2.blocks.0.attn.proj.bias', 'encoder.layers.2.blocks.0.norm2.weight', 'encoder.layers.2.blocks.0.norm2.bias', 'encoder.layers.2.blocks.0.mlp.fc1.bias', 'encoder.layers.2.blocks.0.mlp.fc2.bias', 'encoder.layers.2.blocks.1.norm1.weight', 'encoder.layers.2.blocks.1.norm1.bias', 'encoder.layers.2.blocks.1.attn.qkv.bias', 'encoder.layers.2.blocks.1.attn.proj.bias', 'encoder.layers.2.blocks.1.norm2.weight', 'encoder.layers.2.blocks.1.norm2.bias', 'encoder.layers.2.blocks.1.mlp.fc1.bias', 'encoder.layers.2.blocks.1.mlp.fc2.bias', 'encoder.layers.2.blocks.2.norm1.weight', 'encoder.layers.2.blocks.2.norm1.bias', 'encoder.layers.2.blocks.2.attn.qkv.bias', 'encoder.layers.2.blocks.2.attn.proj.bias', 'encoder.layers.2.blocks.2.norm2.weight', 'encoder.layers.2.blocks.2.norm2.bias', 'encoder.layers.2.blocks.2.mlp.fc1.bias', 'encoder.layers.2.blocks.2.mlp.fc2.bias', 'encoder.layers.2.blocks.3.norm1.weight', 'encoder.layers.2.blocks.3.norm1.bias', 'encoder.layers.2.blocks.3.attn.qkv.bias', 'encoder.layers.2.blocks.3.attn.proj.bias', 'encoder.layers.2.blocks.3.norm2.weight', 'encoder.layers.2.blocks.3.norm2.bias', 'encoder.layers.2.blocks.3.mlp.fc1.bias', 'encoder.layers.2.blocks.3.mlp.fc2.bias', 'encoder.layers.2.blocks.4.norm1.weight', 'encoder.layers.2.blocks.4.norm1.bias', 'encoder.layers.2.blocks.4.attn.qkv.bias', 'encoder.layers.2.blocks.4.attn.proj.bias', 'encoder.layers.2.blocks.4.norm2.weight', 'encoder.layers.2.blocks.4.norm2.bias', 'encoder.layers.2.blocks.4.mlp.fc1.bias', 'encoder.layers.2.blocks.4.mlp.fc2.bias', 'encoder.layers.2.blocks.5.norm1.weight', 'encoder.layers.2.blocks.5.norm1.bias', 'encoder.layers.2.blocks.5.attn.qkv.bias', 'encoder.layers.2.blocks.5.attn.proj.bias', 'encoder.layers.2.blocks.5.norm2.weight', 'encoder.layers.2.blocks.5.norm2.bias', 'encoder.layers.2.blocks.5.mlp.fc1.bias', 'encoder.layers.2.blocks.5.mlp.fc2.bias', 'encoder.layers.2.blocks.6.norm1.weight', 'encoder.layers.2.blocks.6.norm1.bias', 'encoder.layers.2.blocks.6.attn.qkv.bias', 'encoder.layers.2.blocks.6.attn.proj.bias', 'encoder.layers.2.blocks.6.norm2.weight', 'encoder.layers.2.blocks.6.norm2.bias', 'encoder.layers.2.blocks.6.mlp.fc1.bias', 'encoder.layers.2.blocks.6.mlp.fc2.bias', 'encoder.layers.2.blocks.7.norm1.weight', 'encoder.layers.2.blocks.7.norm1.bias', 'encoder.layers.2.blocks.7.attn.qkv.bias', 'encoder.layers.2.blocks.7.attn.proj.bias', 'encoder.layers.2.blocks.7.norm2.weight', 'encoder.layers.2.blocks.7.norm2.bias', 'encoder.layers.2.blocks.7.mlp.fc1.bias', 'encoder.layers.2.blocks.7.mlp.fc2.bias', 'encoder.layers.2.blocks.8.norm1.weight', 'encoder.layers.2.blocks.8.norm1.bias', 'encoder.layers.2.blocks.8.attn.qkv.bias', 'encoder.layers.2.blocks.8.attn.proj.bias', 'encoder.layers.2.blocks.8.norm2.weight', 'encoder.layers.2.blocks.8.norm2.bias', 'encoder.layers.2.blocks.8.mlp.fc1.bias', 'encoder.layers.2.blocks.8.mlp.fc2.bias', 'encoder.layers.2.blocks.9.norm1.weight', 'encoder.layers.2.blocks.9.norm1.bias', 'encoder.layers.2.blocks.9.attn.qkv.bias', 'encoder.layers.2.blocks.9.attn.proj.bias', 'encoder.layers.2.blocks.9.norm2.weight', 'encoder.layers.2.blocks.9.norm2.bias', 'encoder.layers.2.blocks.9.mlp.fc1.bias', 'encoder.layers.2.blocks.9.mlp.fc2.bias', 'encoder.layers.2.blocks.10.norm1.weight', 'encoder.layers.2.blocks.10.norm1.bias', 'encoder.layers.2.blocks.10.attn.qkv.bias', 'encoder.layers.2.blocks.10.attn.proj.bias', 'encoder.layers.2.blocks.10.norm2.weight', 'encoder.layers.2.blocks.10.norm2.bias', 'encoder.layers.2.blocks.10.mlp.fc1.bias', 'encoder.layers.2.blocks.10.mlp.fc2.bias', 'encoder.layers.2.blocks.11.norm1.weight', 'encoder.layers.2.blocks.11.norm1.bias', 'encoder.layers.2.blocks.11.attn.qkv.bias', 'encoder.layers.2.blocks.11.attn.proj.bias', 'encoder.layers.2.blocks.11.norm2.weight', 'encoder.layers.2.blocks.11.norm2.bias', 'encoder.layers.2.blocks.11.mlp.fc1.bias', 'encoder.layers.2.blocks.11.mlp.fc2.bias', 'encoder.layers.2.blocks.12.norm1.weight', 'encoder.layers.2.blocks.12.norm1.bias', 'encoder.layers.2.blocks.12.attn.qkv.bias', 'encoder.layers.2.blocks.12.attn.proj.bias', 'encoder.layers.2.blocks.12.norm2.weight', 'encoder.layers.2.blocks.12.norm2.bias', 'encoder.layers.2.blocks.12.mlp.fc1.bias', 'encoder.layers.2.blocks.12.mlp.fc2.bias', 'encoder.layers.2.blocks.13.norm1.weight', 'encoder.layers.2.blocks.13.norm1.bias', 'encoder.layers.2.blocks.13.attn.qkv.bias', 'encoder.layers.2.blocks.13.attn.proj.bias', 'encoder.layers.2.blocks.13.norm2.weight', 'encoder.layers.2.blocks.13.norm2.bias', 'encoder.layers.2.blocks.13.mlp.fc1.bias', 'encoder.layers.2.blocks.13.mlp.fc2.bias', 'encoder.layers.2.blocks.14.norm1.weight', 'encoder.layers.2.blocks.14.norm1.bias', 'encoder.layers.2.blocks.14.attn.qkv.bias', 'encoder.layers.2.blocks.14.attn.proj.bias', 'encoder.layers.2.blocks.14.norm2.weight', 'encoder.layers.2.blocks.14.norm2.bias', 'encoder.layers.2.blocks.14.mlp.fc1.bias', 'encoder.layers.2.blocks.14.mlp.fc2.bias', 'encoder.layers.2.blocks.15.norm1.weight', 'encoder.layers.2.blocks.15.norm1.bias', 'encoder.layers.2.blocks.15.attn.qkv.bias', 'encoder.layers.2.blocks.15.attn.proj.bias', 'encoder.layers.2.blocks.15.norm2.weight', 'encoder.layers.2.blocks.15.norm2.bias', 'encoder.layers.2.blocks.15.mlp.fc1.bias', 'encoder.layers.2.blocks.15.mlp.fc2.bias', 'encoder.layers.2.blocks.16.norm1.weight', 'encoder.layers.2.blocks.16.norm1.bias', 'encoder.layers.2.blocks.16.attn.qkv.bias', 'encoder.layers.2.blocks.16.attn.proj.bias', 'encoder.layers.2.blocks.16.norm2.weight', 'encoder.layers.2.blocks.16.norm2.bias', 'encoder.layers.2.blocks.16.mlp.fc1.bias', 'encoder.layers.2.blocks.16.mlp.fc2.bias', 'encoder.layers.2.blocks.17.norm1.weight', 'encoder.layers.2.blocks.17.norm1.bias', 'encoder.layers.2.blocks.17.attn.qkv.bias', 'encoder.layers.2.blocks.17.attn.proj.bias', 'encoder.layers.2.blocks.17.norm2.weight', 'encoder.layers.2.blocks.17.norm2.bias', 'encoder.layers.2.blocks.17.mlp.fc1.bias', 'encoder.layers.2.blocks.17.mlp.fc2.bias', 'encoder.layers.2.downsample.norm.weight', 'encoder.layers.2.downsample.norm.bias', 'encoder.layers.3.blocks.0.norm1.weight', 'encoder.layers.3.blocks.0.norm1.bias', 'encoder.layers.3.blocks.0.attn.qkv.bias', 'encoder.layers.3.blocks.0.attn.proj.bias', 'encoder.layers.3.blocks.0.norm2.weight', 'encoder.layers.3.blocks.0.norm2.bias', 'encoder.layers.3.blocks.0.mlp.fc1.bias', 'encoder.layers.3.blocks.0.mlp.fc2.bias', 'encoder.layers.3.blocks.1.norm1.weight', 'encoder.layers.3.blocks.1.norm1.bias', 'encoder.layers.3.blocks.1.attn.qkv.bias', 'encoder.layers.3.blocks.1.attn.proj.bias', 'encoder.layers.3.blocks.1.norm2.weight', 'encoder.layers.3.blocks.1.norm2.bias', 'encoder.layers.3.blocks.1.mlp.fc1.bias', 'encoder.layers.3.blocks.1.mlp.fc2.bias', 'encoder.norm.weight', 'encoder.norm.bias', 'decoder.0.bias']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 64): INFO Has decay params: ['encoder.patch_embed.proj.weight', 'encoder.layers.0.blocks.0.attn.relative_position_bias_table', 'encoder.layers.0.blocks.0.attn.qkv.weight', 'encoder.layers.0.blocks.0.attn.proj.weight', 'encoder.layers.0.blocks.0.mlp.fc1.weight', 'encoder.layers.0.blocks.0.mlp.fc2.weight', 'encoder.layers.0.blocks.1.attn.relative_position_bias_table', 'encoder.layers.0.blocks.1.attn.qkv.weight', 'encoder.layers.0.blocks.1.attn.proj.weight', 'encoder.layers.0.blocks.1.mlp.fc1.weight', 'encoder.layers.0.blocks.1.mlp.fc2.weight', 'encoder.layers.0.downsample.reduction.weight', 'encoder.layers.1.blocks.0.attn.relative_position_bias_table', 'encoder.layers.1.blocks.0.attn.qkv.weight', 'encoder.layers.1.blocks.0.attn.proj.weight', 'encoder.layers.1.blocks.0.mlp.fc1.weight', 'encoder.layers.1.blocks.0.mlp.fc2.weight', 'encoder.layers.1.blocks.1.attn.relative_position_bias_table', 'encoder.layers.1.blocks.1.attn.qkv.weight', 'encoder.layers.1.blocks.1.attn.proj.weight', 'encoder.layers.1.blocks.1.mlp.fc1.weight', 'encoder.layers.1.blocks.1.mlp.fc2.weight', 'encoder.layers.1.downsample.reduction.weight', 'encoder.layers.2.blocks.0.attn.relative_position_bias_table', 'encoder.layers.2.blocks.0.attn.qkv.weight', 'encoder.layers.2.blocks.0.attn.proj.weight', 'encoder.layers.2.blocks.0.mlp.fc1.weight', 'encoder.layers.2.blocks.0.mlp.fc2.weight', 'encoder.layers.2.blocks.1.attn.relative_position_bias_table', 'encoder.layers.2.blocks.1.attn.qkv.weight', 'encoder.layers.2.blocks.1.attn.proj.weight', 'encoder.layers.2.blocks.1.mlp.fc1.weight', 'encoder.layers.2.blocks.1.mlp.fc2.weight', 'encoder.layers.2.blocks.2.attn.relative_position_bias_table', 'encoder.layers.2.blocks.2.attn.qkv.weight', 'encoder.layers.2.blocks.2.attn.proj.weight', 'encoder.layers.2.blocks.2.mlp.fc1.weight', 'encoder.layers.2.blocks.2.mlp.fc2.weight', 'encoder.layers.2.blocks.3.attn.relative_position_bias_table', 'encoder.layers.2.blocks.3.attn.qkv.weight', 'encoder.layers.2.blocks.3.attn.proj.weight', 'encoder.layers.2.blocks.3.mlp.fc1.weight', 'encoder.layers.2.blocks.3.mlp.fc2.weight', 'encoder.layers.2.blocks.4.attn.relative_position_bias_table', 'encoder.layers.2.blocks.4.attn.qkv.weight', 'encoder.layers.2.blocks.4.attn.proj.weight', 'encoder.layers.2.blocks.4.mlp.fc1.weight', 'encoder.layers.2.blocks.4.mlp.fc2.weight', 'encoder.layers.2.blocks.5.attn.relative_position_bias_table', 'encoder.layers.2.blocks.5.attn.qkv.weight', 'encoder.layers.2.blocks.5.attn.proj.weight', 'encoder.layers.2.blocks.5.mlp.fc1.weight', 'encoder.layers.2.blocks.5.mlp.fc2.weight', 'encoder.layers.2.blocks.6.attn.relative_position_bias_table', 'encoder.layers.2.blocks.6.attn.qkv.weight', 'encoder.layers.2.blocks.6.attn.proj.weight', 'encoder.layers.2.blocks.6.mlp.fc1.weight', 'encoder.layers.2.blocks.6.mlp.fc2.weight', 'encoder.layers.2.blocks.7.attn.relative_position_bias_table', 'encoder.layers.2.blocks.7.attn.qkv.weight', 'encoder.layers.2.blocks.7.attn.proj.weight', 'encoder.layers.2.blocks.7.mlp.fc1.weight', 'encoder.layers.2.blocks.7.mlp.fc2.weight', 'encoder.layers.2.blocks.8.attn.relative_position_bias_table', 'encoder.layers.2.blocks.8.attn.qkv.weight', 'encoder.layers.2.blocks.8.attn.proj.weight', 'encoder.layers.2.blocks.8.mlp.fc1.weight', 'encoder.layers.2.blocks.8.mlp.fc2.weight', 'encoder.layers.2.blocks.9.attn.relative_position_bias_table', 'encoder.layers.2.blocks.9.attn.qkv.weight', 'encoder.layers.2.blocks.9.attn.proj.weight', 'encoder.layers.2.blocks.9.mlp.fc1.weight', 'encoder.layers.2.blocks.9.mlp.fc2.weight', 'encoder.layers.2.blocks.10.attn.relative_position_bias_table', 'encoder.layers.2.blocks.10.attn.qkv.weight', 'encoder.layers.2.blocks.10.attn.proj.weight', 'encoder.layers.2.blocks.10.mlp.fc1.weight', 'encoder.layers.2.blocks.10.mlp.fc2.weight', 'encoder.layers.2.blocks.11.attn.relative_position_bias_table', 'encoder.layers.2.blocks.11.attn.qkv.weight', 'encoder.layers.2.blocks.11.attn.proj.weight', 'encoder.layers.2.blocks.11.mlp.fc1.weight', 'encoder.layers.2.blocks.11.mlp.fc2.weight', 'encoder.layers.2.blocks.12.attn.relative_position_bias_table', 'encoder.layers.2.blocks.12.attn.qkv.weight', 'encoder.layers.2.blocks.12.attn.proj.weight', 'encoder.layers.2.blocks.12.mlp.fc1.weight', 'encoder.layers.2.blocks.12.mlp.fc2.weight', 'encoder.layers.2.blocks.13.attn.relative_position_bias_table', 'encoder.layers.2.blocks.13.attn.qkv.weight', 'encoder.layers.2.blocks.13.attn.proj.weight', 'encoder.layers.2.blocks.13.mlp.fc1.weight', 'encoder.layers.2.blocks.13.mlp.fc2.weight', 'encoder.layers.2.blocks.14.attn.relative_position_bias_table', 'encoder.layers.2.blocks.14.attn.qkv.weight', 'encoder.layers.2.blocks.14.attn.proj.weight', 'encoder.layers.2.blocks.14.mlp.fc1.weight', 'encoder.layers.2.blocks.14.mlp.fc2.weight', 'encoder.layers.2.blocks.15.attn.relative_position_bias_table', 'encoder.layers.2.blocks.15.attn.qkv.weight', 'encoder.layers.2.blocks.15.attn.proj.weight', 'encoder.layers.2.blocks.15.mlp.fc1.weight', 'encoder.layers.2.blocks.15.mlp.fc2.weight', 'encoder.layers.2.blocks.16.attn.relative_position_bias_table', 'encoder.layers.2.blocks.16.attn.qkv.weight', 'encoder.layers.2.blocks.16.attn.proj.weight', 'encoder.layers.2.blocks.16.mlp.fc1.weight', 'encoder.layers.2.blocks.16.mlp.fc2.weight', 'encoder.layers.2.blocks.17.attn.relative_position_bias_table', 'encoder.layers.2.blocks.17.attn.qkv.weight', 'encoder.layers.2.blocks.17.attn.proj.weight', 'encoder.layers.2.blocks.17.mlp.fc1.weight', 'encoder.layers.2.blocks.17.mlp.fc2.weight', 'encoder.layers.2.downsample.reduction.weight', 'encoder.layers.3.blocks.0.attn.relative_position_bias_table', 'encoder.layers.3.blocks.0.attn.qkv.weight', 'encoder.layers.3.blocks.0.attn.proj.weight', 'encoder.layers.3.blocks.0.mlp.fc1.weight', 'encoder.layers.3.blocks.0.mlp.fc2.weight', 'encoder.layers.3.blocks.1.attn.relative_position_bias_table', 'encoder.layers.3.blocks.1.attn.qkv.weight', 'encoder.layers.3.blocks.1.attn.proj.weight', 'encoder.layers.3.blocks.1.mlp.fc1.weight', 'encoder.layers.3.blocks.1.mlp.fc2.weight', 'decoder.0.weight']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 43): INFO AdamW (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0008
weight_decay: 0.05
Parameter Group 1
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.0008
weight_decay: 0.0
)
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 83): INFO number of params: 89874104
[2022-02-05 09:22:26 simmim_pretrain] (utils.py 81): INFO All checkpoints founded in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep: []
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 100): INFO no checkpoint found in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep, ignoring auto resume
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 105): INFO Start training
[2022-02-05 09:24:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][0/1251] eta 1 day, 15:53:49 lr 0.000004 time 114.8121 (114.8121) loss 0.5543 (0.5543) grad_norm 0.2902 (0.2902) mem 17192MB
[2022-02-05 09:45:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][100/1251] eta 4:24:36 lr 0.000010 time 0.3949 (13.7934) loss 0.4499 (0.4969) grad_norm 1.0401 (0.2900) mem 18238MB
[2022-02-05 10:06:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][200/1251] eta 3:52:30 lr 0.000017 time 75.5072 (13.2732) loss 0.3752 (0.4565) grad_norm 2.8639 (1.6425) mem 18238MB
[2022-02-05 10:28:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][300/1251] eta 3:27:27 lr 0.000023 time 0.3941 (13.0894) loss 0.3553 (0.4264) grad_norm 2.0591 (2.8358) mem 18238MB
[2022-02-05 10:48:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][400/1251] eta 3:02:30 lr 0.000029 time 57.4084 (12.8679) loss 0.3173 (0.4040) grad_norm 1.1405 (3.6005) mem 18238MB
[2022-02-05 11:08:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][500/1251] eta 2:38:59 lr 0.000036 time 0.3942 (12.7019) loss 0.3129 (0.3879) grad_norm 4.7302 (4.0156) mem 18238MB
[2022-02-05 11:29:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][600/1251] eta 2:17:56 lr 0.000042 time 86.9880 (12.7132) loss 0.3042 (0.3741) grad_norm 2.4576 (4.0197) mem 18238MB
[2022-02-05 11:49:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][700/1251] eta 1:55:17 lr 0.000048 time 0.3943 (12.5542) loss 0.2920 (0.3630) grad_norm 4.6089 (4.0017) mem 18239MB
[2022-02-05 12:09:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][800/1251] eta 1:34:10 lr 0.000055 time 73.9639 (12.5290) loss 0.2979 (0.3536) grad_norm 3.4510 (3.9055) mem 18239MB
[2022-02-05 12:29:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][900/1251] eta 1:13:00 lr 0.000061 time 0.3981 (12.4787) loss 0.2693 (0.3459) grad_norm 1.5775 (3.8091) mem 18239MB
[2022-02-05 12:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1000/1251] eta 0:52:00 lr 0.000068 time 18.3918 (12.4334) loss 0.2786 (0.3394) grad_norm 1.2491 (3.7356) mem 18239MB
[2022-02-05 13:10:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1100/1251] eta 0:31:18 lr 0.000074 time 0.4033 (12.4426) loss 0.2725 (0.3335) grad_norm 2.2311 (3.6312) mem 18239MB
[2022-02-05 13:30:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1200/1251] eta 0:10:32 lr 0.000080 time 31.6500 (12.4020) loss 0.2715 (0.3286) grad_norm 1.2720 (3.5534) mem 18239MB
[2022-02-05 13:39:44 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 0 training takes 4:17:18
[2022-02-05 13:39:44 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saving......
[2022-02-05 13:39:46 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saved !!!
[2022-02-05 13:39:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][0/1251] eta 1:01:34 lr 0.000083 time 2.9530 (2.9530) loss 0.2705 (0.2705) grad_norm 0.8280 (0.8280) mem 18239MB
[2022-02-05 13:41:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][100/1251] eta 0:14:17 lr 0.000090 time 0.6114 (0.7453) loss 0.2802 (0.2693) grad_norm 3.6450 (2.3059) mem 18239MB
[2022-02-05 13:42:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][200/1251] eta 0:14:40 lr 0.000096 time 0.7879 (0.8375) loss 0.2727 (0.2691) grad_norm 2.2279 (2.2994) mem 18239MB
[2022-02-05 13:44:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][300/1251] eta 0:13:41 lr 0.000103 time 0.4401 (0.8638) loss 0.2757 (0.2682) grad_norm 1.1539 (2.2752) mem 18239MB
[2022-02-05 13:45:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][400/1251] eta 0:11:34 lr 0.000109 time 0.4306 (0.8162) loss 0.2588 (0.2672) grad_norm 1.2593 (2.2458) mem 18239MB
[2022-02-05 13:46:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][500/1251] eta 0:10:38 lr 0.000115 time 0.5900 (0.8503) loss 0.2552 (0.2668) grad_norm 1.4727 (2.2056) mem 18240MB
[2022-02-05 13:47:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][600/1251] eta 0:08:45 lr 0.000122 time 0.4254 (0.8066) loss 0.2584 (0.2662) grad_norm 1.1834 (2.1712) mem 18240MB
[2022-02-05 13:48:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][700/1251] eta 0:06:56 lr 0.000128 time 0.4058 (0.7558) loss 0.2641 (0.2653) grad_norm 1.1315 (2.1186) mem 18240MB
[2022-02-05 13:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][800/1251] eta 0:05:41 lr 0.000134 time 0.4352 (0.7570) loss 0.2742 (0.2649) grad_norm 0.7488 (2.0964) mem 18240MB
[2022-02-05 13:51:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][900/1251] eta 0:04:35 lr 0.000141 time 0.4130 (0.7842) loss 0.2476 (0.2644) grad_norm 0.6401 (2.0539) mem 18240MB
[2022-02-05 13:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1000/1251] eta 0:03:08 lr 0.000147 time 0.4153 (0.7508) loss 0.2717 (0.2639) grad_norm 2.2334 (2.0098) mem 18240MB
[2022-02-05 13:53:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1100/1251] eta 0:01:51 lr 0.000154 time 0.4521 (0.7393) loss 0.2551 (0.2633) grad_norm 1.4980 (1.9817) mem 18240MB
[2022-02-05 13:55:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1200/1251] eta 0:00:39 lr 0.000160 time 0.4667 (0.7788) loss 0.2664 (0.2627) grad_norm 0.7340 (1.9572) mem 18240MB
[2022-02-05 13:56:06 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 1 training takes 0:16:20
[2022-02-05 13:56:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][0/1251] eta 1:16:36 lr 0.000163 time 3.6739 (3.6739) loss 0.2620 (0.2620) grad_norm 0.9611 (0.9611) mem 18240MB
[2022-02-05 13:56:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][100/1251] eta 0:09:22 lr 0.000169 time 0.4276 (0.4883) loss 0.2562 (0.2552) grad_norm 0.5311 (1.6903) mem 18240MB
[2022-02-05 13:58:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][200/1251] eta 0:11:20 lr 0.000176 time 0.4207 (0.6473) loss 0.2618 (0.2542) grad_norm 0.6081 (1.6235) mem 18240MB
[2022-02-05 13:59:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][300/1251] eta 0:09:36 lr 0.000182 time 0.4451 (0.6061) loss 0.2528 (0.2531) grad_norm 0.4520 (1.6033) mem 18240MB
[2022-02-05 14:00:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][400/1251] eta 0:09:29 lr 0.000189 time 0.4445 (0.6689) loss 0.2413 (0.2525) grad_norm 0.6562 (1.5654) mem 18240MB
[2022-02-05 14:01:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][500/1251] eta 0:08:08 lr 0.000195 time 2.1151 (0.6503) loss 0.2539 (0.2520) grad_norm 1.8790 (1.5394) mem 18240MB
[2022-02-05 14:03:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][600/1251] eta 0:07:55 lr 0.000201 time 0.5791 (0.7308) loss 0.2295 (0.2516) grad_norm 1.3565 (1.5373) mem 18240MB
[2022-02-05 14:04:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][700/1251] eta 0:06:53 lr 0.000208 time 0.4240 (0.7508) loss 0.2464 (0.2511) grad_norm 0.5189 (1.5236) mem 18240MB
[2022-02-05 14:06:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][800/1251] eta 0:05:48 lr 0.000214 time 2.2608 (0.7731) loss 0.2481 (0.2507) grad_norm 0.4695 (1.4909) mem 18240MB
[2022-02-05 14:08:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][900/1251] eta 0:04:40 lr 0.000220 time 0.4068 (0.7993) loss 0.2637 (0.2503) grad_norm 1.5514 (1.4829) mem 18240MB
[2022-02-05 14:09:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1000/1251] eta 0:03:23 lr 0.000227 time 1.3535 (0.8115) loss 0.2443 (0.2498) grad_norm 0.9744 (1.4653) mem 18240MB
[2022-02-05 14:11:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1100/1251] eta 0:02:04 lr 0.000233 time 0.4410 (0.8271) loss 0.2425 (0.2491) grad_norm 1.9947 (1.4427) mem 18240MB
[2022-02-05 14:12:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1200/1251] eta 0:00:42 lr 0.000239 time 0.4275 (0.8318) loss 0.2465 (0.2486) grad_norm 0.6265 (1.4366) mem 18240MB
[2022-02-05 14:13:21 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 2 training takes 0:17:15
[2022-02-05 14:13:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][0/1251] eta 1:24:40 lr 0.000243 time 4.0614 (4.0614) loss 0.2433 (0.2433) grad_norm 0.7067 (0.7067) mem 18240MB
[2022-02-05 14:14:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][100/1251] eta 0:09:27 lr 0.000249 time 0.4092 (0.4935) loss 0.2476 (0.2430) grad_norm 1.1159 (1.3134) mem 18240MB
[2022-02-05 14:15:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][200/1251] eta 0:10:07 lr 0.000255 time 0.3964 (0.5784) loss 0.2400 (0.2425) grad_norm 0.3384 (1.2386) mem 18240MB
[2022-02-05 14:16:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][300/1251] eta 0:08:53 lr 0.000262 time 0.4966 (0.5605) loss 0.2404 (0.2416) grad_norm 0.3401 (1.1964) mem 18240MB
[2022-02-05 14:17:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][400/1251] eta 0:09:07 lr 0.000268 time 0.7116 (0.6430) loss 0.2314 (0.2411) grad_norm 1.4900 (1.2040) mem 18240MB
[2022-02-05 14:19:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][500/1251] eta 0:09:34 lr 0.000275 time 0.4066 (0.7646) loss 0.2282 (0.2405) grad_norm 0.5011 (1.2036) mem 18240MB
[2022-02-05 14:21:30 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][600/1251] eta 0:08:49 lr 0.000281 time 0.4160 (0.8126) loss 0.2414 (0.2404) grad_norm 0.9795 (1.1974) mem 18240MB
[2022-02-05 14:23:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][700/1251] eta 0:07:41 lr 0.000287 time 0.4862 (0.8377) loss 0.2334 (0.2401) grad_norm 0.4512 (1.1759) mem 18240MB
[2022-02-05 14:24:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][800/1251] eta 0:06:27 lr 0.000294 time 0.5067 (0.8583) loss 0.2418 (0.2398) grad_norm 1.2394 (1.1746) mem 18240MB
[2022-02-05 14:26:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][900/1251] eta 0:05:03 lr 0.000300 time 0.4366 (0.8635) loss 0.2361 (0.2394) grad_norm 0.5397 (1.1654) mem 18240MB
[2022-02-05 14:27:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1000/1251] eta 0:03:37 lr 0.000306 time 1.1073 (0.8658) loss 0.2352 (0.2390) grad_norm 0.6021 (1.1541) mem 18241MB
[2022-02-05 14:29:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1100/1251] eta 0:02:08 lr 0.000313 time 0.5436 (0.8526) loss 0.2295 (0.2387) grad_norm 1.1236 (1.1382) mem 18241MB
[2022-02-05 14:30:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1200/1251] eta 0:00:44 lr 0.000319 time 0.4830 (0.8667) loss 0.2486 (0.2385) grad_norm 0.4466 (1.1277) mem 18241MB
[2022-02-05 14:31:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 3 training takes 0:18:11
[2022-02-05 14:31:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][0/1251] eta 1:30:19 lr 0.000322 time 4.3320 (4.3320) loss 0.2309 (0.2309) grad_norm 0.3473 (0.3473) mem 18241MB
[2022-02-05 14:32:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][100/1251] eta 0:09:42 lr 0.000329 time 0.4199 (0.5059) loss 0.2313 (0.2347) grad_norm 0.9780 (0.9537) mem 18241MB
[2022-02-05 14:33:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][200/1251] eta 0:10:42 lr 0.000335 time 0.4042 (0.6115) loss 0.2380 (0.2338) grad_norm 0.4685 (0.9641) mem 18241MB
[2022-02-05 14:35:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][300/1251] eta 0:12:57 lr 0.000341 time 0.4448 (0.8171) loss 0.2274 (0.2339) grad_norm 0.5854 (0.9808) mem 18241MB
[2022-02-05 14:37:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][400/1251] eta 0:12:15 lr 0.000348 time 0.9284 (0.8642) loss 0.2300 (0.2342) grad_norm 0.5273 (0.9884) mem 18241MB
[2022-02-05 14:38:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][500/1251] eta 0:10:11 lr 0.000354 time 0.4123 (0.8136) loss 0.2346 (0.2337) grad_norm 0.7111 (0.9791) mem 18241MB
[2022-02-05 14:39:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][600/1251] eta 0:09:01 lr 0.000361 time 0.4479 (0.8323) loss 0.2305 (0.2336) grad_norm 0.7723 (0.9726) mem 18241MB
[2022-02-05 14:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][700/1251] eta 0:07:37 lr 0.000367 time 1.2561 (0.8298) loss 0.2416 (0.2333) grad_norm 0.7113 (0.9652) mem 18241MB
[2022-02-05 14:43:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][800/1251] eta 0:06:29 lr 0.000373 time 2.0054 (0.8637) loss 0.2229 (0.2332) grad_norm 0.3053 (0.9582) mem 18241MB
[2022-02-05 14:44:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][900/1251] eta 0:05:07 lr 0.000380 time 2.2077 (0.8764) loss 0.2203 (0.2330) grad_norm 0.9912 (0.9536) mem 18241MB
[2022-02-05 14:46:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1000/1251] eta 0:03:41 lr 0.000386 time 0.4317 (0.8842) loss 0.2330 (0.2327) grad_norm 0.4332 (0.9454) mem 18241MB
[2022-02-05 14:47:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1100/1251] eta 0:02:09 lr 0.000392 time 1.9930 (0.8594) loss 0.2376 (0.2325) grad_norm 0.3494 (0.9425) mem 18241MB
[2022-02-05 14:49:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1200/1251] eta 0:00:44 lr 0.000399 time 3.0251 (0.8816) loss 0.2229 (0.2322) grad_norm 0.3280 (0.9404) mem 18241MB
[2022-02-05 14:49:56 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 4 training takes 0:18:23
[2022-02-05 14:50:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][0/1251] eta 1:20:39 lr 0.000402 time 3.8685 (3.8685) loss 0.2361 (0.2361) grad_norm 0.2441 (0.2441) mem 18241MB
[2022-02-05 14:50:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][100/1251] eta 0:09:10 lr 0.000408 time 0.4087 (0.4786) loss 0.2268 (0.2297) grad_norm 0.4077 (0.9384) mem 18241MB
[2022-02-05 14:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][200/1251] eta 0:11:06 lr 0.000415 time 0.4741 (0.6344) loss 0.2332 (0.2293) grad_norm 0.6293 (0.8776) mem 18241MB
[2022-02-05 14:53:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][300/1251] eta 0:09:43 lr 0.000421 time 0.4483 (0.6141) loss 0.4291 (0.2770) grad_norm 0.1236 (nan) mem 18241MB
[2022-02-05 14:54:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][400/1251] eta 0:09:25 lr 0.000427 time 0.4158 (0.6646) loss 0.4575 (0.3163) grad_norm 0.7630 (nan) mem 18241MB
[2022-02-05 14:55:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][500/1251] eta 0:07:59 lr 0.000434 time 0.4047 (0.6385) loss 0.4399 (0.3400) grad_norm 1.2759 (nan) mem 18241MB
[2022-02-05 14:57:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][600/1251] eta 0:07:54 lr 0.000440 time 0.3981 (0.7282) loss 0.5130 (0.3663) grad_norm 0.0916 (nan) mem 18241MB
[2022-02-05 14:58:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][700/1251] eta 0:06:42 lr 0.000446 time 0.4456 (0.7310) loss 0.4537 (0.3812) grad_norm 0.5641 (nan) mem 18241MB
[2022-02-05 14:59:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][800/1251] eta 0:05:14 lr 0.000453 time 0.4507 (0.6973) loss 0.5098 (0.3919) grad_norm 0.3081 (nan) mem 18241MB
[2022-02-05 15:01:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][900/1251] eta 0:04:22 lr 0.000459 time 0.4445 (0.7472) loss 0.4913 (0.4046) grad_norm 0.0084 (nan) mem 18241MB
[2022-02-05 15:03:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1000/1251] eta 0:03:16 lr 0.000466 time 3.5568 (0.7843) loss 0.5040 (0.4146) grad_norm 0.0242 (nan) mem 18241MB
[2022-02-05 15:04:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1100/1251] eta 0:01:58 lr 0.000472 time 0.4314 (0.7874) loss 0.4915 (0.4226) grad_norm 0.2421 (nan) mem 18241MB
[2022-02-05 15:05:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1200/1251] eta 0:00:38 lr 0.000478 time 0.4317 (0.7606) loss 0.5508 (0.4266) grad_norm 0.0683 (nan) mem 18241MB
[2022-02-05 15:05:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 5 training takes 0:15:37
[2022-02-05 15:05:33 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saving......
[2022-02-05 15:05:36 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saved !!!
[2022-02-05 15:05:40 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][0/1251] eta 1:23:58 lr 0.000481 time 4.0272 (4.0272) loss 0.4738 (0.4738) grad_norm 1.3644 (1.3644) mem 18241MB
[2022-02-05 15:07:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][100/1251] eta 0:16:57 lr 0.000488 time 0.4414 (0.8841) loss 0.4598 (0.4497) grad_norm 0.1361 (nan) mem 18241MB
[2022-02-05 15:08:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][200/1251] eta 0:16:22 lr 0.000494 time 0.4016 (0.9352) loss 0.5048 (0.4722) grad_norm 0.0122 (nan) mem 18241MB
[2022-02-05 15:10:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][300/1251] eta 0:14:59 lr 0.000501 time 1.8795 (0.9461) loss 0.4727 (0.4830) grad_norm 0.0052 (nan) mem 18241MB
[2022-02-05 15:12:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][400/1251] eta 0:13:44 lr 0.000507 time 1.0088 (0.9684) loss 0.4677 (0.4865) grad_norm 0.0896 (nan) mem 18241MB
[2022-02-05 15:13:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][500/1251] eta 0:11:58 lr 0.000513 time 0.4878 (0.9563) loss 0.5154 (0.4816) grad_norm 24.0200 (nan) mem 18241MB
[2022-02-05 15:15:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][600/1251] eta 0:10:28 lr 0.000520 time 0.4229 (0.9660) loss 0.4594 (0.4808) grad_norm 0.9102 (nan) mem 18243MB
[2022-02-05 15:16:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][700/1251] eta 0:08:53 lr 0.000526 time 6.5413 (0.9676) loss 0.4411 (0.4790) grad_norm 0.7869 (nan) mem 18243MB
[2022-02-05 15:18:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][800/1251] eta 0:07:16 lr 0.000532 time 0.4438 (0.9674) loss 0.4367 (0.4746) grad_norm 1.4051 (nan) mem 18243MB
[2022-02-05 15:20:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][900/1251] eta 0:05:41 lr 0.000539 time 3.8164 (0.9736) loss 0.4383 (0.4707) grad_norm 0.0261 (nan) mem 18243MB
[2022-02-05 15:21:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1000/1251] eta 0:04:03 lr 0.000545 time 1.6960 (0.9705) loss 0.4484 (0.4665) grad_norm 21.2195 (nan) mem 18243MB
[2022-02-05 15:23:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1100/1251] eta 0:02:26 lr 0.000552 time 0.4055 (0.9699) loss 0.4562 (0.4642) grad_norm 1.7039 (nan) mem 18243MB
[2022-02-05 15:24:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1200/1251] eta 0:00:49 lr 0.000558 time 0.6191 (0.9622) loss 0.4597 (0.4641) grad_norm 1.6285 (nan) mem 18243MB
[2022-02-05 15:25:37 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 6 training takes 0:20:01
[2022-02-05 15:25:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][0/1251] eta 1:24:51 lr 0.000561 time 4.0702 (4.0702) loss 0.4520 (0.4520) grad_norm 0.2361 (0.2361) mem 18243MB
[2022-02-05 15:26:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][100/1251] eta 0:09:22 lr 0.000567 time 0.4117 (0.4889) loss 0.4651 (0.4644) grad_norm 0.0263 (1.5361) mem 18243MB
[2022-02-05 15:27:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][200/1251] eta 0:08:14 lr 0.000574 time 0.5824 (0.4702) loss 0.4427 (0.4608) grad_norm 0.4953 (2.5894) mem 18243MB
[2022-02-05 15:27:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][300/1251] eta 0:07:20 lr 0.000580 time 0.4171 (0.4631) loss 0.4863 (0.4698) grad_norm 0.0398 (2.0401) mem 18243MB
[2022-02-05 15:30:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][400/1251] eta 0:09:53 lr 0.000587 time 0.4244 (0.6975) loss 0.4536 (0.4673) grad_norm 0.3069 (1.8147) mem 18243MB
[2022-02-05 15:32:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][500/1251] eta 0:09:52 lr 0.000593 time 0.4264 (0.7884) loss 0.4325 (0.4624) grad_norm 0.1347 (2.3873) mem 18243MB
[2022-02-05 15:33:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][600/1251] eta 0:08:55 lr 0.000599 time 1.4852 (0.8225) loss 0.4847 (0.4582) grad_norm 9.4087 (2.4801) mem 18243MB
[2022-02-05 15:35:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][700/1251] eta 0:07:38 lr 0.000606 time 0.4803 (0.8325) loss 0.4958 (0.4663) grad_norm 0.0721 (2.2051) mem 18243MB
[2022-02-05 15:37:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][800/1251] eta 0:06:24 lr 0.000612 time 0.4407 (0.8519) loss 0.5036 (0.4695) grad_norm 0.0279 (2.1894) mem 18243MB
[2022-02-05 15:38:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][900/1251] eta 0:05:05 lr 0.000618 time 1.0429 (0.8691) loss 0.4598 (0.4725) grad_norm 0.3491 (1.9502) mem 18243MB
[2022-02-05 15:40:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1000/1251] eta 0:03:39 lr 0.000625 time 0.4146 (0.8731) loss 0.4447 (0.4727) grad_norm 0.0666 (1.8049) mem 18243MB
[2022-02-05 15:41:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1100/1251] eta 0:02:13 lr 0.000631 time 0.4089 (0.8846) loss 0.5773 (0.4706) grad_norm 533.4438 (2.1825) mem 18243MB
[2022-02-05 15:43:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1200/1251] eta 0:00:45 lr 0.000637 time 0.4819 (0.8905) loss 0.4459 (0.4708) grad_norm 0.2206 (inf) mem 18243MB
[2022-02-05 15:44:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 7 training takes 0:18:32
[2022-02-05 15:44:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][0/1251] eta 1:22:31 lr 0.000641 time 3.9582 (3.9582) loss 0.4379 (0.4379) grad_norm 0.1034 (0.1034) mem 18243MB
[2022-02-05 15:44:59 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][100/1251] eta 0:09:21 lr 0.000647 time 0.4268 (0.4878) loss 0.4240 (0.4471) grad_norm 0.1163 (0.5972) mem 18243MB
[2022-02-05 15:46:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][200/1251] eta 0:12:24 lr 0.000653 time 0.4756 (0.7080) loss 0.5335 (0.4479) grad_norm 0.6204 (5.4569) mem 18243MB
[2022-02-05 15:47:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][300/1251] eta 0:09:50 lr 0.000660 time 0.4158 (0.6213) loss 0.5053 (0.4720) grad_norm 0.0163 (3.8024) mem 18243MB
[2022-02-05 15:48:02 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][400/1251] eta 0:08:11 lr 0.000666 time 0.4401 (0.5773) loss 0.4971 (0.4803) grad_norm 0.0055 (2.8562) mem 18243MB
[2022-02-05 15:49:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][500/1251] eta 0:07:22 lr 0.000673 time 1.8764 (0.5890) loss 0.5002 (0.4848) grad_norm 0.0067 (2.2872) mem 18243MB
[2022-02-05 15:51:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][600/1251] eta 0:07:31 lr 0.000679 time 0.4090 (0.6942) loss 0.4947 (0.4882) grad_norm 0.0027 (1.9076) mem 18243MB
[2022-02-05 15:52:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][700/1251] eta 0:06:50 lr 0.000685 time 2.9712 (0.7456) loss 0.5094 (0.4906) grad_norm 0.0018 (1.6364) mem 18243MB
[2022-02-05 15:53:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][800/1251] eta 0:05:29 lr 0.000692 time 0.5231 (0.7305) loss 0.5050 (0.4927) grad_norm 0.0023 (1.4328) mem 18243MB
[2022-02-05 15:55:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][900/1251] eta 0:04:29 lr 0.000698 time 0.4867 (0.7679) loss 0.5158 (0.4942) grad_norm 0.0031 (1.2744) mem 18243MB
[2022-02-05 15:57:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1000/1251] eta 0:03:18 lr 0.000704 time 2.5024 (0.7920) loss 0.5137 (0.4952) grad_norm 0.0069 (1.1477) mem 18243MB
[2022-02-05 15:58:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1100/1251] eta 0:01:56 lr 0.000711 time 0.4010 (0.7713) loss 0.5179 (0.4962) grad_norm 0.0027 (1.0440) mem 18243MB
[2022-02-05 16:00:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1200/1251] eta 0:00:40 lr 0.000717 time 0.3946 (0.7944) loss 0.5119 (0.4969) grad_norm 0.0025 (0.9576) mem 18243MB
[2022-02-05 16:00:52 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 8 training takes 0:16:41
[2022-02-05 16:00:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][0/1251] eta 1:20:40 lr 0.000720 time 3.8696 (3.8696) loss 0.4855 (0.4855) grad_norm 0.0029 (0.0029) mem 18243MB
[2022-02-05 16:01:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][100/1251] eta 0:09:21 lr 0.000727 time 0.4418 (0.4877) loss 0.5036 (0.5047) grad_norm 0.0077 (0.0063) mem 18243MB
[2022-02-05 16:02:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][200/1251] eta 0:08:13 lr 0.000733 time 0.4427 (0.4691) loss 0.5000 (0.5045) grad_norm 0.0043 (0.0063) mem 18243MB
[2022-02-05 16:03:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][300/1251] eta 0:07:20 lr 0.000739 time 0.4323 (0.4635) loss 0.5210 (0.5047) grad_norm 0.0036 (0.0064) mem 18243MB
[2022-02-05 16:05:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][400/1251] eta 0:08:49 lr 0.000746 time 0.4176 (0.6220) loss 0.4839 (0.5052) grad_norm 0.0049 (0.0073) mem 18243MB
[2022-02-05 16:06:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][500/1251] eta 0:09:07 lr 0.000752 time 0.4086 (0.7284) loss 0.4946 (0.5054) grad_norm 0.0034 (0.0072) mem 18243MB
[2022-02-05 16:08:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][600/1251] eta 0:08:17 lr 0.000759 time 0.4523 (0.7643) loss 0.5037 (0.5055) grad_norm 0.0185 (0.0070) mem 18243MB
[2022-02-05 16:10:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][700/1251] eta 0:07:18 lr 0.000765 time 0.4846 (0.7965) loss 0.5141 (0.5057) grad_norm 0.0029 (0.0071) mem 18243MB
[2022-02-05 16:11:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][800/1251] eta 0:06:11 lr 0.000771 time 0.5237 (0.8228) loss 0.4947 (0.5055) grad_norm 0.0037 (0.0071) mem 18243MB
[2022-02-05 16:13:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][900/1251] eta 0:04:50 lr 0.000778 time 0.4529 (0.8269) loss 0.5303 (0.5055) grad_norm 0.0031 (0.0073) mem 18243MB
[2022-02-05 16:14:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1000/1251] eta 0:03:28 lr 0.000784 time 5.7999 (0.8313) loss 0.5151 (0.5056) grad_norm 0.0050 (0.0074) mem 18243MB
[2022-02-05 16:16:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1100/1251] eta 0:02:07 lr 0.000790 time 1.1566 (0.8422) loss 0.4930 (0.5055) grad_norm 0.0044 (0.0074) mem 18243MB
[2022-02-05 16:17:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1200/1251] eta 0:00:43 lr 0.000797 time 0.5183 (0.8531) loss 0.4922 (0.5056) grad_norm 0.0028 (0.0076) mem 18243MB
[2022-02-05 16:18:39 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 9 training takes 0:17:46
[2022-02-05 16:18:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][0/1251] eta 1:24:03 lr 0.000800 time 4.0314 (4.0314) loss 0.5028 (0.5028) grad_norm 0.0046 (0.0046) mem 18243MB
[2022-02-05 16:19:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][100/1251] eta 0:09:18 lr 0.000781 time 0.4616 (0.4852) loss 0.5053 (0.5051) grad_norm 0.0029 (0.0082) mem 18243MB
[2022-02-05 16:20:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][200/1251] eta 0:10:45 lr 0.000781 time 1.2564 (0.6146) loss 0.5208 (0.5047) grad_norm 0.0030 (0.0077) mem 18243MB
[2022-02-05 16:22:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][300/1251] eta 0:13:19 lr 0.000781 time 0.4131 (0.8408) loss 0.5163 (0.5054) grad_norm 0.0067 (0.0078) mem 18243MB
[2022-02-05 16:24:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][400/1251] eta 0:12:00 lr 0.000780 time 0.4386 (0.8464) loss 0.5159 (0.5057) grad_norm 0.0075 (0.0083) mem 18243MB
[2022-02-05 16:25:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][500/1251] eta 0:09:37 lr 0.000780 time 0.4158 (0.7694) loss 0.5114 (0.5056) grad_norm 0.0055 (0.0083) mem 18243MB
[2022-02-05 16:26:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][600/1251] eta 0:08:21 lr 0.000780 time 0.4583 (0.7696) loss 0.5191 (0.5058) grad_norm 0.0064 (0.0083) mem 18243MB
[2022-02-05 16:27:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][700/1251] eta 0:06:53 lr 0.000779 time 0.4195 (0.7505) loss 0.4864 (0.5056) grad_norm 0.0081 (0.0085) mem 18243MB
[2022-02-05 16:29:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][800/1251] eta 0:06:11 lr 0.000779 time 1.3727 (0.8233) loss 0.4949 (0.5058) grad_norm 0.0031 (0.0089) mem 18243MB
[2022-02-05 16:31:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][900/1251] eta 0:04:56 lr 0.000779 time 8.4577 (0.8454) loss 0.5168 (0.5056) grad_norm 0.0051 (0.0087) mem 18243MB
[2022-02-05 16:32:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1000/1251] eta 0:03:30 lr 0.000778 time 0.4084 (0.8387) loss 0.5202 (0.5056) grad_norm 0.0031 (0.0088) mem 18243MB
[2022-02-05 16:34:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1100/1251] eta 0:02:08 lr 0.000778 time 0.4177 (0.8530) loss 0.5111 (0.5056) grad_norm 0.0056 (0.0087) mem 18243MB
[2022-02-05 16:35:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1200/1251] eta 0:00:44 lr 0.000778 time 0.4621 (0.8640) loss 0.4990 (0.5055) grad_norm 0.0050 (0.0088) mem 18243MB
[2022-02-05 16:36:41 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 10 training takes 0:18:02
[2022-02-05 16:36:41 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saving......
[2022-02-05 16:36:44 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saved !!!
[2022-02-05 16:36:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][0/1251] eta 1:13:03 lr 0.000778 time 3.5042 (3.5042) loss 0.5118 (0.5118) grad_norm 0.0109 (0.0109) mem 18243MB
[2022-02-05 16:37:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][100/1251] eta 0:12:09 lr 0.000777 time 0.5750 (0.6336) loss 0.5325 (0.5074) grad_norm 0.0052 (0.0076) mem 18243MB
[2022-02-05 16:39:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][200/1251] eta 0:14:14 lr 0.000777 time 0.4904 (0.8130) loss 0.5064 (0.5059) grad_norm 0.0036 (0.0104) mem 18243MB
[2022-02-05 16:41:03 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][300/1251] eta 0:13:38 lr 0.000777 time 0.5833 (0.8607) loss 0.4996 (0.5055) grad_norm 0.0044 (0.0099) mem 18243MB
[2022-02-05 16:42:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][400/1251] eta 0:12:40 lr 0.000776 time 1.4644 (0.8932) loss 0.5054 (0.5057) grad_norm 0.0049 (0.0093) mem 18243MB
[2022-02-05 16:43:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][500/1251] eta 0:10:45 lr 0.000776 time 0.4532 (0.8590) loss 0.4886 (0.5055) grad_norm 0.0048 (0.0096) mem 18243MB
[2022-02-05 16:45:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][600/1251] eta 0:09:28 lr 0.000776 time 0.4424 (0.8730) loss 0.5031 (0.5053) grad_norm 0.0035 (0.0096) mem 18243MB
[2022-02-05 16:46:58 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][700/1251] eta 0:08:02 lr 0.000775 time 0.4489 (0.8760) loss 0.5361 (0.5057) grad_norm 0.0055 (0.0094) mem 18243MB
[2022-02-05 16:47:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][800/1251] eta 0:06:11 lr 0.000775 time 0.4341 (0.8229) loss 0.4948 (0.5057) grad_norm 0.0069 (0.0095) mem 18243MB
[2022-02-05 16:49:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][900/1251] eta 0:04:55 lr 0.000775 time 0.4356 (0.8426) loss 0.5100 (0.5056) grad_norm 0.0119 (0.0096) mem 18243MB
[2022-02-05 16:50:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1000/1251] eta 0:03:23 lr 0.000774 time 0.6975 (0.8099) loss 0.5050 (0.5058) grad_norm 0.0047 (0.0097) mem 18243MB
[2022-02-05 16:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1100/1251] eta 0:02:06 lr 0.000774 time 1.0458 (0.8359) loss 0.5302 (0.5060) grad_norm 0.0044 (0.0095) mem 18243MB
[2022-02-05 16:53:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1200/1251] eta 0:00:43 lr 0.000773 time 0.4221 (0.8547) loss 0.5000 (0.5061) grad_norm 0.0074 (0.0096) mem 18243MB
[2022-02-05 16:54:34 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 11 training takes 0:17:50
[2022-02-05 16:54:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][0/1251] eta 1:16:28 lr 0.000773 time 3.6676 (3.6676) loss 0.5215 (0.5215) grad_norm 0.0116 (0.0116) mem 18243MB
[2022-02-05 16:55:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][100/1251] eta 0:14:50 lr 0.000773 time 0.7810 (0.7737) loss 0.5164 (0.5068) grad_norm 0.0069 (0.0073) mem 18243MB
[2022-02-05 16:57:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][200/1251] eta 0:14:47 lr 0.000773 time 1.7244 (0.8448) loss 0.5187 (0.5058) grad_norm 0.0096 (0.0076) mem 18243MB
[2022-02-05 16:58:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][300/1251] eta 0:13:41 lr 0.000772 time 0.4478 (0.8634) loss 0.4969 (0.5061) grad_norm 0.0054 (0.0086) mem 18243MB
[2022-02-05 17:00:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][400/1251] eta 0:11:49 lr 0.000772 time 0.6186 (0.8340) loss 0.5156 (0.5064) grad_norm 0.0082 (0.0085) mem 18243MB
[2022-02-05 17:01:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][500/1251] eta 0:11:01 lr 0.000772 time 0.4288 (0.8806) loss 0.5079 (0.5062) grad_norm 0.0048 (0.0085) mem 18243MB
[2022-02-05 17:03:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][600/1251] eta 0:09:45 lr 0.000771 time 0.4818 (0.8992) loss 0.4947 (0.5059) grad_norm 0.0039 (0.0084) mem 18243MB
[2022-02-05 17:05:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][700/1251] eta 0:08:20 lr 0.000771 time 0.4087 (0.9092) loss 0.4817 (0.5060) grad_norm 0.0042 (0.0083) mem 18243MB
[2022-02-05 17:06:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][800/1251] eta 0:06:54 lr 0.000770 time 3.8632 (0.9187) loss 0.5162 (0.5057) grad_norm 0.0056 (0.0085) mem 18243MB
[2022-02-05 17:08:22 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][900/1251] eta 0:05:22 lr 0.000770 time 0.6304 (0.9192) loss 0.5097 (0.5058) grad_norm 0.0882 (0.0122) mem 18243MB
[2022-02-05 17:09:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1000/1251] eta 0:03:50 lr 0.000770 time 0.4901 (0.9200) loss 0.5016 (0.5059) grad_norm 0.0458 (0.0284) mem 18243MB
[2022-02-05 17:11:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1100/1251] eta 0:02:20 lr 0.000769 time 4.5322 (0.9288) loss 0.4981 (0.5058) grad_norm 0.0095 (0.0943) mem 18243MB
[2022-02-05 17:13:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1200/1251] eta 0:00:47 lr 0.000769 time 0.6195 (0.9275) loss 0.5148 (0.5059) grad_norm 0.0085 (0.0929) mem 18243MB
[2022-02-05 17:13:54 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 12 training takes 0:19:19
[2022-02-05 17:13:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][0/1251] eta 1:16:38 lr 0.000769 time 3.6756 (3.6756) loss 0.5141 (0.5141) grad_norm 0.0134 (0.0134) mem 18243MB
[2022-02-05 17:14:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][100/1251] eta 0:09:08 lr 0.000768 time 0.4265 (0.4769) loss 0.4834 (0.5059) grad_norm 0.0057 (0.0109) mem 18243MB
[2022-02-05 17:16:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][200/1251] eta 0:11:38 lr 0.000768 time 0.4268 (0.6645) loss 0.5068 (0.5065) grad_norm 0.0053 (0.0105) mem 18243MB
[2022-02-05 17:17:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][300/1251] eta 0:12:06 lr 0.000768 time 0.5157 (0.7636) loss 0.4987 (0.5058) grad_norm 0.0068 (0.0111) mem 18243MB
[2022-02-05 17:19:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][400/1251] eta 0:11:13 lr 0.000767 time 0.4169 (0.7910) loss 0.5224 (0.5060) grad_norm 0.0052 (0.0109) mem 18243MB
[2022-02-05 17:20:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][500/1251] eta 0:09:47 lr 0.000767 time 0.5883 (0.7817) loss 0.4828 (0.5060) grad_norm 0.0175 (0.0106) mem 18243MB
[2022-02-05 17:21:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][600/1251] eta 0:08:12 lr 0.000766 time 3.2032 (0.7566) loss 0.5117 (0.5061) grad_norm 0.0134 (0.0104) mem 18243MB
[2022-02-05 17:23:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][700/1251] eta 0:07:20 lr 0.000766 time 0.4028 (0.8003) loss 0.5239 (0.5060) grad_norm 0.0044 (0.0108) mem 18243MB
[2022-02-05 17:25:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][800/1251] eta 0:06:15 lr 0.000766 time 0.3985 (0.8326) loss 0.5014 (0.5062) grad_norm 0.0114 (0.0110) mem 18243MB
[2022-02-05 17:26:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][900/1251] eta 0:04:58 lr 0.000765 time 0.6148 (0.8492) loss 0.5003 (0.5062) grad_norm 0.0027 (0.0109) mem 18243MB
[2022-02-05 17:28:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1000/1251] eta 0:03:36 lr 0.000765 time 0.4273 (0.8616) loss 0.4993 (0.5062) grad_norm 0.0140 (0.0108) mem 18243MB
[2022-02-05 17:30:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1100/1251] eta 0:02:12 lr 0.000764 time 0.4532 (0.8779) loss 0.5004 (0.5061) grad_norm 0.0107 (0.0109) mem 18243MB
[2022-02-05 17:31:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1200/1251] eta 0:00:44 lr 0.000764 time 0.5378 (0.8783) loss 0.4941 (0.5061) grad_norm 0.0027 (0.0106) mem 18243MB
[2022-02-05 17:32:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 13 training takes 0:18:16
[2022-02-05 17:32:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][0/1251] eta 1:28:50 lr 0.000764 time 4.2612 (4.2612) loss 0.4906 (0.4906) grad_norm 0.0057 (0.0057) mem 18243MB
[2022-02-05 17:33:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][100/1251] eta 0:09:30 lr 0.000763 time 0.4356 (0.4958) loss 0.4973 (0.5057) grad_norm 0.0031 (0.0087) mem 18243MB
[2022-02-05 17:34:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][200/1251] eta 0:11:46 lr 0.000763 time 0.4247 (0.6724) loss 0.5087 (0.5059) grad_norm 0.0055 (0.0088) mem 18243MB
[2022-02-05 17:36:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][300/1251] eta 0:13:02 lr 0.000763 time 0.5249 (0.8233) loss 0.5212 (0.5071) grad_norm 0.0065 (0.0095) mem 18243MB
[2022-02-05 17:37:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][400/1251] eta 0:12:15 lr 0.000762 time 0.6521 (0.8638) loss 0.4970 (0.5065) grad_norm 0.0145 (0.0094) mem 18243MB
[2022-02-05 17:39:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][500/1251] eta 0:11:08 lr 0.000762 time 1.7114 (0.8902) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][600/1251] eta 0:09:49 lr 0.000761 time 3.6215 (0.9061) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:42:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][700/1251] eta 0:08:23 lr 0.000761 time 0.5390 (0.9145) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:44:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][800/1251] eta 0:06:54 lr 0.000761 time 0.8262 (0.9181) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:46:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][900/1251] eta 0:05:25 lr 0.000760 time 0.7292 (0.9266) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:47:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1000/1251] eta 0:03:53 lr 0.000760 time 0.5330 (0.9303) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:49:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1100/1251] eta 0:02:20 lr 0.000759 time 1.6054 (0.9295) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:50:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1200/1251] eta 0:00:47 lr 0.000759 time 0.4148 (0.9302) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:51:29 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 14 training takes 0:19:19
[2022-02-05 17:51:33 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][0/1251] eta 1:25:26 lr 0.000759 time 4.0980 (4.0980) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][100/1251] eta 0:09:03 lr 0.000758 time 0.4545 (0.4721) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:53:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][200/1251] eta 0:11:56 lr 0.000758 time 0.4286 (0.6820) loss nan (nan) grad_norm nan (nan) mem 18243MB
[2022-02-05 17:54:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][300/1251] eta 0:10:23 lr 0.000757 time 0.4350 (0.6558) loss nan (nan) grad_norm nan (nan) mem 18243MB
I did not modify any of the configs except for specifying --accumulation-steps 2
from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?
Thank you!
I encounter the same problem. How do you solve it?