SimMIM icon indicating copy to clipboard operation
SimMIM copied to clipboard

Loss goes nan after 14 epochs

Open DianCh opened this issue 3 years ago • 1 comments

Hi, thank you for releasing such a wonderful work. I tried to replicate the results using the following command:

python -m torch.distributed.launch --nproc_per_node 8 main_simmim.py --cfg configs/swin_base__100ep/simmim_pretrain__swin_base__img192_window6__100ep.yaml --data-path /mnt/fsx/datasets/imagenet/train --accumulation-steps 2

which gave me nan loss after 14 epochs:

[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 22): INFO >>>>>>>>>> Build Optimizer for Pre-training Stage
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 27): INFO No weight decay: {'encoder.mask_token', 'encoder.absolute_pos_embed'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 30): INFO No weight decay keywords: {'encoder.relative_position_bias_table'}
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 63): INFO No decay params: ['encoder.mask_token', 'encoder.patch_embed.proj.bias', 'encoder.patch_embed.norm.weight', 'encoder.patch_embed.norm.bias', 'encoder.layers.0.blocks.0.norm1.weight', 'encoder.layers.0.blocks.0.norm1.bias', 'encoder.layers.0.blocks.0.attn.qkv.bias', 'encoder.layers.0.blocks.0.attn.proj.bias', 'encoder.layers.0.blocks.0.norm2.weight', 'encoder.layers.0.blocks.0.norm2.bias', 'encoder.layers.0.blocks.0.mlp.fc1.bias', 'encoder.layers.0.blocks.0.mlp.fc2.bias', 'encoder.layers.0.blocks.1.norm1.weight', 'encoder.layers.0.blocks.1.norm1.bias', 'encoder.layers.0.blocks.1.attn.qkv.bias', 'encoder.layers.0.blocks.1.attn.proj.bias', 'encoder.layers.0.blocks.1.norm2.weight', 'encoder.layers.0.blocks.1.norm2.bias', 'encoder.layers.0.blocks.1.mlp.fc1.bias', 'encoder.layers.0.blocks.1.mlp.fc2.bias', 'encoder.layers.0.downsample.norm.weight', 'encoder.layers.0.downsample.norm.bias', 'encoder.layers.1.blocks.0.norm1.weight', 'encoder.layers.1.blocks.0.norm1.bias', 'encoder.layers.1.blocks.0.attn.qkv.bias', 'encoder.layers.1.blocks.0.attn.proj.bias', 'encoder.layers.1.blocks.0.norm2.weight', 'encoder.layers.1.blocks.0.norm2.bias', 'encoder.layers.1.blocks.0.mlp.fc1.bias', 'encoder.layers.1.blocks.0.mlp.fc2.bias', 'encoder.layers.1.blocks.1.norm1.weight', 'encoder.layers.1.blocks.1.norm1.bias', 'encoder.layers.1.blocks.1.attn.qkv.bias', 'encoder.layers.1.blocks.1.attn.proj.bias', 'encoder.layers.1.blocks.1.norm2.weight', 'encoder.layers.1.blocks.1.norm2.bias', 'encoder.layers.1.blocks.1.mlp.fc1.bias', 'encoder.layers.1.blocks.1.mlp.fc2.bias', 'encoder.layers.1.downsample.norm.weight', 'encoder.layers.1.downsample.norm.bias', 'encoder.layers.2.blocks.0.norm1.weight', 'encoder.layers.2.blocks.0.norm1.bias', 'encoder.layers.2.blocks.0.attn.qkv.bias', 'encoder.layers.2.blocks.0.attn.proj.bias', 'encoder.layers.2.blocks.0.norm2.weight', 'encoder.layers.2.blocks.0.norm2.bias', 'encoder.layers.2.blocks.0.mlp.fc1.bias', 'encoder.layers.2.blocks.0.mlp.fc2.bias', 'encoder.layers.2.blocks.1.norm1.weight', 'encoder.layers.2.blocks.1.norm1.bias', 'encoder.layers.2.blocks.1.attn.qkv.bias', 'encoder.layers.2.blocks.1.attn.proj.bias', 'encoder.layers.2.blocks.1.norm2.weight', 'encoder.layers.2.blocks.1.norm2.bias', 'encoder.layers.2.blocks.1.mlp.fc1.bias', 'encoder.layers.2.blocks.1.mlp.fc2.bias', 'encoder.layers.2.blocks.2.norm1.weight', 'encoder.layers.2.blocks.2.norm1.bias', 'encoder.layers.2.blocks.2.attn.qkv.bias', 'encoder.layers.2.blocks.2.attn.proj.bias', 'encoder.layers.2.blocks.2.norm2.weight', 'encoder.layers.2.blocks.2.norm2.bias', 'encoder.layers.2.blocks.2.mlp.fc1.bias', 'encoder.layers.2.blocks.2.mlp.fc2.bias', 'encoder.layers.2.blocks.3.norm1.weight', 'encoder.layers.2.blocks.3.norm1.bias', 'encoder.layers.2.blocks.3.attn.qkv.bias', 'encoder.layers.2.blocks.3.attn.proj.bias', 'encoder.layers.2.blocks.3.norm2.weight', 'encoder.layers.2.blocks.3.norm2.bias', 'encoder.layers.2.blocks.3.mlp.fc1.bias', 'encoder.layers.2.blocks.3.mlp.fc2.bias', 'encoder.layers.2.blocks.4.norm1.weight', 'encoder.layers.2.blocks.4.norm1.bias', 'encoder.layers.2.blocks.4.attn.qkv.bias', 'encoder.layers.2.blocks.4.attn.proj.bias', 'encoder.layers.2.blocks.4.norm2.weight', 'encoder.layers.2.blocks.4.norm2.bias', 'encoder.layers.2.blocks.4.mlp.fc1.bias', 'encoder.layers.2.blocks.4.mlp.fc2.bias', 'encoder.layers.2.blocks.5.norm1.weight', 'encoder.layers.2.blocks.5.norm1.bias', 'encoder.layers.2.blocks.5.attn.qkv.bias', 'encoder.layers.2.blocks.5.attn.proj.bias', 'encoder.layers.2.blocks.5.norm2.weight', 'encoder.layers.2.blocks.5.norm2.bias', 'encoder.layers.2.blocks.5.mlp.fc1.bias', 'encoder.layers.2.blocks.5.mlp.fc2.bias', 'encoder.layers.2.blocks.6.norm1.weight', 'encoder.layers.2.blocks.6.norm1.bias', 'encoder.layers.2.blocks.6.attn.qkv.bias', 'encoder.layers.2.blocks.6.attn.proj.bias', 'encoder.layers.2.blocks.6.norm2.weight', 'encoder.layers.2.blocks.6.norm2.bias', 'encoder.layers.2.blocks.6.mlp.fc1.bias', 'encoder.layers.2.blocks.6.mlp.fc2.bias', 'encoder.layers.2.blocks.7.norm1.weight', 'encoder.layers.2.blocks.7.norm1.bias', 'encoder.layers.2.blocks.7.attn.qkv.bias', 'encoder.layers.2.blocks.7.attn.proj.bias', 'encoder.layers.2.blocks.7.norm2.weight', 'encoder.layers.2.blocks.7.norm2.bias', 'encoder.layers.2.blocks.7.mlp.fc1.bias', 'encoder.layers.2.blocks.7.mlp.fc2.bias', 'encoder.layers.2.blocks.8.norm1.weight', 'encoder.layers.2.blocks.8.norm1.bias', 'encoder.layers.2.blocks.8.attn.qkv.bias', 'encoder.layers.2.blocks.8.attn.proj.bias', 'encoder.layers.2.blocks.8.norm2.weight', 'encoder.layers.2.blocks.8.norm2.bias', 'encoder.layers.2.blocks.8.mlp.fc1.bias', 'encoder.layers.2.blocks.8.mlp.fc2.bias', 'encoder.layers.2.blocks.9.norm1.weight', 'encoder.layers.2.blocks.9.norm1.bias', 'encoder.layers.2.blocks.9.attn.qkv.bias', 'encoder.layers.2.blocks.9.attn.proj.bias', 'encoder.layers.2.blocks.9.norm2.weight', 'encoder.layers.2.blocks.9.norm2.bias', 'encoder.layers.2.blocks.9.mlp.fc1.bias', 'encoder.layers.2.blocks.9.mlp.fc2.bias', 'encoder.layers.2.blocks.10.norm1.weight', 'encoder.layers.2.blocks.10.norm1.bias', 'encoder.layers.2.blocks.10.attn.qkv.bias', 'encoder.layers.2.blocks.10.attn.proj.bias', 'encoder.layers.2.blocks.10.norm2.weight', 'encoder.layers.2.blocks.10.norm2.bias', 'encoder.layers.2.blocks.10.mlp.fc1.bias', 'encoder.layers.2.blocks.10.mlp.fc2.bias', 'encoder.layers.2.blocks.11.norm1.weight', 'encoder.layers.2.blocks.11.norm1.bias', 'encoder.layers.2.blocks.11.attn.qkv.bias', 'encoder.layers.2.blocks.11.attn.proj.bias', 'encoder.layers.2.blocks.11.norm2.weight', 'encoder.layers.2.blocks.11.norm2.bias', 'encoder.layers.2.blocks.11.mlp.fc1.bias', 'encoder.layers.2.blocks.11.mlp.fc2.bias', 'encoder.layers.2.blocks.12.norm1.weight', 'encoder.layers.2.blocks.12.norm1.bias', 'encoder.layers.2.blocks.12.attn.qkv.bias', 'encoder.layers.2.blocks.12.attn.proj.bias', 'encoder.layers.2.blocks.12.norm2.weight', 'encoder.layers.2.blocks.12.norm2.bias', 'encoder.layers.2.blocks.12.mlp.fc1.bias', 'encoder.layers.2.blocks.12.mlp.fc2.bias', 'encoder.layers.2.blocks.13.norm1.weight', 'encoder.layers.2.blocks.13.norm1.bias', 'encoder.layers.2.blocks.13.attn.qkv.bias', 'encoder.layers.2.blocks.13.attn.proj.bias', 'encoder.layers.2.blocks.13.norm2.weight', 'encoder.layers.2.blocks.13.norm2.bias', 'encoder.layers.2.blocks.13.mlp.fc1.bias', 'encoder.layers.2.blocks.13.mlp.fc2.bias', 'encoder.layers.2.blocks.14.norm1.weight', 'encoder.layers.2.blocks.14.norm1.bias', 'encoder.layers.2.blocks.14.attn.qkv.bias', 'encoder.layers.2.blocks.14.attn.proj.bias', 'encoder.layers.2.blocks.14.norm2.weight', 'encoder.layers.2.blocks.14.norm2.bias', 'encoder.layers.2.blocks.14.mlp.fc1.bias', 'encoder.layers.2.blocks.14.mlp.fc2.bias', 'encoder.layers.2.blocks.15.norm1.weight', 'encoder.layers.2.blocks.15.norm1.bias', 'encoder.layers.2.blocks.15.attn.qkv.bias', 'encoder.layers.2.blocks.15.attn.proj.bias', 'encoder.layers.2.blocks.15.norm2.weight', 'encoder.layers.2.blocks.15.norm2.bias', 'encoder.layers.2.blocks.15.mlp.fc1.bias', 'encoder.layers.2.blocks.15.mlp.fc2.bias', 'encoder.layers.2.blocks.16.norm1.weight', 'encoder.layers.2.blocks.16.norm1.bias', 'encoder.layers.2.blocks.16.attn.qkv.bias', 'encoder.layers.2.blocks.16.attn.proj.bias', 'encoder.layers.2.blocks.16.norm2.weight', 'encoder.layers.2.blocks.16.norm2.bias', 'encoder.layers.2.blocks.16.mlp.fc1.bias', 'encoder.layers.2.blocks.16.mlp.fc2.bias', 'encoder.layers.2.blocks.17.norm1.weight', 'encoder.layers.2.blocks.17.norm1.bias', 'encoder.layers.2.blocks.17.attn.qkv.bias', 'encoder.layers.2.blocks.17.attn.proj.bias', 'encoder.layers.2.blocks.17.norm2.weight', 'encoder.layers.2.blocks.17.norm2.bias', 'encoder.layers.2.blocks.17.mlp.fc1.bias', 'encoder.layers.2.blocks.17.mlp.fc2.bias', 'encoder.layers.2.downsample.norm.weight', 'encoder.layers.2.downsample.norm.bias', 'encoder.layers.3.blocks.0.norm1.weight', 'encoder.layers.3.blocks.0.norm1.bias', 'encoder.layers.3.blocks.0.attn.qkv.bias', 'encoder.layers.3.blocks.0.attn.proj.bias', 'encoder.layers.3.blocks.0.norm2.weight', 'encoder.layers.3.blocks.0.norm2.bias', 'encoder.layers.3.blocks.0.mlp.fc1.bias', 'encoder.layers.3.blocks.0.mlp.fc2.bias', 'encoder.layers.3.blocks.1.norm1.weight', 'encoder.layers.3.blocks.1.norm1.bias', 'encoder.layers.3.blocks.1.attn.qkv.bias', 'encoder.layers.3.blocks.1.attn.proj.bias', 'encoder.layers.3.blocks.1.norm2.weight', 'encoder.layers.3.blocks.1.norm2.bias', 'encoder.layers.3.blocks.1.mlp.fc1.bias', 'encoder.layers.3.blocks.1.mlp.fc2.bias', 'encoder.norm.weight', 'encoder.norm.bias', 'decoder.0.bias']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 64): INFO Has decay params: ['encoder.patch_embed.proj.weight', 'encoder.layers.0.blocks.0.attn.relative_position_bias_table', 'encoder.layers.0.blocks.0.attn.qkv.weight', 'encoder.layers.0.blocks.0.attn.proj.weight', 'encoder.layers.0.blocks.0.mlp.fc1.weight', 'encoder.layers.0.blocks.0.mlp.fc2.weight', 'encoder.layers.0.blocks.1.attn.relative_position_bias_table', 'encoder.layers.0.blocks.1.attn.qkv.weight', 'encoder.layers.0.blocks.1.attn.proj.weight', 'encoder.layers.0.blocks.1.mlp.fc1.weight', 'encoder.layers.0.blocks.1.mlp.fc2.weight', 'encoder.layers.0.downsample.reduction.weight', 'encoder.layers.1.blocks.0.attn.relative_position_bias_table', 'encoder.layers.1.blocks.0.attn.qkv.weight', 'encoder.layers.1.blocks.0.attn.proj.weight', 'encoder.layers.1.blocks.0.mlp.fc1.weight', 'encoder.layers.1.blocks.0.mlp.fc2.weight', 'encoder.layers.1.blocks.1.attn.relative_position_bias_table', 'encoder.layers.1.blocks.1.attn.qkv.weight', 'encoder.layers.1.blocks.1.attn.proj.weight', 'encoder.layers.1.blocks.1.mlp.fc1.weight', 'encoder.layers.1.blocks.1.mlp.fc2.weight', 'encoder.layers.1.downsample.reduction.weight', 'encoder.layers.2.blocks.0.attn.relative_position_bias_table', 'encoder.layers.2.blocks.0.attn.qkv.weight', 'encoder.layers.2.blocks.0.attn.proj.weight', 'encoder.layers.2.blocks.0.mlp.fc1.weight', 'encoder.layers.2.blocks.0.mlp.fc2.weight', 'encoder.layers.2.blocks.1.attn.relative_position_bias_table', 'encoder.layers.2.blocks.1.attn.qkv.weight', 'encoder.layers.2.blocks.1.attn.proj.weight', 'encoder.layers.2.blocks.1.mlp.fc1.weight', 'encoder.layers.2.blocks.1.mlp.fc2.weight', 'encoder.layers.2.blocks.2.attn.relative_position_bias_table', 'encoder.layers.2.blocks.2.attn.qkv.weight', 'encoder.layers.2.blocks.2.attn.proj.weight', 'encoder.layers.2.blocks.2.mlp.fc1.weight', 'encoder.layers.2.blocks.2.mlp.fc2.weight', 'encoder.layers.2.blocks.3.attn.relative_position_bias_table', 'encoder.layers.2.blocks.3.attn.qkv.weight', 'encoder.layers.2.blocks.3.attn.proj.weight', 'encoder.layers.2.blocks.3.mlp.fc1.weight', 'encoder.layers.2.blocks.3.mlp.fc2.weight', 'encoder.layers.2.blocks.4.attn.relative_position_bias_table', 'encoder.layers.2.blocks.4.attn.qkv.weight', 'encoder.layers.2.blocks.4.attn.proj.weight', 'encoder.layers.2.blocks.4.mlp.fc1.weight', 'encoder.layers.2.blocks.4.mlp.fc2.weight', 'encoder.layers.2.blocks.5.attn.relative_position_bias_table', 'encoder.layers.2.blocks.5.attn.qkv.weight', 'encoder.layers.2.blocks.5.attn.proj.weight', 'encoder.layers.2.blocks.5.mlp.fc1.weight', 'encoder.layers.2.blocks.5.mlp.fc2.weight', 'encoder.layers.2.blocks.6.attn.relative_position_bias_table', 'encoder.layers.2.blocks.6.attn.qkv.weight', 'encoder.layers.2.blocks.6.attn.proj.weight', 'encoder.layers.2.blocks.6.mlp.fc1.weight', 'encoder.layers.2.blocks.6.mlp.fc2.weight', 'encoder.layers.2.blocks.7.attn.relative_position_bias_table', 'encoder.layers.2.blocks.7.attn.qkv.weight', 'encoder.layers.2.blocks.7.attn.proj.weight', 'encoder.layers.2.blocks.7.mlp.fc1.weight', 'encoder.layers.2.blocks.7.mlp.fc2.weight', 'encoder.layers.2.blocks.8.attn.relative_position_bias_table', 'encoder.layers.2.blocks.8.attn.qkv.weight', 'encoder.layers.2.blocks.8.attn.proj.weight', 'encoder.layers.2.blocks.8.mlp.fc1.weight', 'encoder.layers.2.blocks.8.mlp.fc2.weight', 'encoder.layers.2.blocks.9.attn.relative_position_bias_table', 'encoder.layers.2.blocks.9.attn.qkv.weight', 'encoder.layers.2.blocks.9.attn.proj.weight', 'encoder.layers.2.blocks.9.mlp.fc1.weight', 'encoder.layers.2.blocks.9.mlp.fc2.weight', 'encoder.layers.2.blocks.10.attn.relative_position_bias_table', 'encoder.layers.2.blocks.10.attn.qkv.weight', 'encoder.layers.2.blocks.10.attn.proj.weight', 'encoder.layers.2.blocks.10.mlp.fc1.weight', 'encoder.layers.2.blocks.10.mlp.fc2.weight', 'encoder.layers.2.blocks.11.attn.relative_position_bias_table', 'encoder.layers.2.blocks.11.attn.qkv.weight', 'encoder.layers.2.blocks.11.attn.proj.weight', 'encoder.layers.2.blocks.11.mlp.fc1.weight', 'encoder.layers.2.blocks.11.mlp.fc2.weight', 'encoder.layers.2.blocks.12.attn.relative_position_bias_table', 'encoder.layers.2.blocks.12.attn.qkv.weight', 'encoder.layers.2.blocks.12.attn.proj.weight', 'encoder.layers.2.blocks.12.mlp.fc1.weight', 'encoder.layers.2.blocks.12.mlp.fc2.weight', 'encoder.layers.2.blocks.13.attn.relative_position_bias_table', 'encoder.layers.2.blocks.13.attn.qkv.weight', 'encoder.layers.2.blocks.13.attn.proj.weight', 'encoder.layers.2.blocks.13.mlp.fc1.weight', 'encoder.layers.2.blocks.13.mlp.fc2.weight', 'encoder.layers.2.blocks.14.attn.relative_position_bias_table', 'encoder.layers.2.blocks.14.attn.qkv.weight', 'encoder.layers.2.blocks.14.attn.proj.weight', 'encoder.layers.2.blocks.14.mlp.fc1.weight', 'encoder.layers.2.blocks.14.mlp.fc2.weight', 'encoder.layers.2.blocks.15.attn.relative_position_bias_table', 'encoder.layers.2.blocks.15.attn.qkv.weight', 'encoder.layers.2.blocks.15.attn.proj.weight', 'encoder.layers.2.blocks.15.mlp.fc1.weight', 'encoder.layers.2.blocks.15.mlp.fc2.weight', 'encoder.layers.2.blocks.16.attn.relative_position_bias_table', 'encoder.layers.2.blocks.16.attn.qkv.weight', 'encoder.layers.2.blocks.16.attn.proj.weight', 'encoder.layers.2.blocks.16.mlp.fc1.weight', 'encoder.layers.2.blocks.16.mlp.fc2.weight', 'encoder.layers.2.blocks.17.attn.relative_position_bias_table', 'encoder.layers.2.blocks.17.attn.qkv.weight', 'encoder.layers.2.blocks.17.attn.proj.weight', 'encoder.layers.2.blocks.17.mlp.fc1.weight', 'encoder.layers.2.blocks.17.mlp.fc2.weight', 'encoder.layers.2.downsample.reduction.weight', 'encoder.layers.3.blocks.0.attn.relative_position_bias_table', 'encoder.layers.3.blocks.0.attn.qkv.weight', 'encoder.layers.3.blocks.0.attn.proj.weight', 'encoder.layers.3.blocks.0.mlp.fc1.weight', 'encoder.layers.3.blocks.0.mlp.fc2.weight', 'encoder.layers.3.blocks.1.attn.relative_position_bias_table', 'encoder.layers.3.blocks.1.attn.qkv.weight', 'encoder.layers.3.blocks.1.attn.proj.weight', 'encoder.layers.3.blocks.1.mlp.fc1.weight', 'encoder.layers.3.blocks.1.mlp.fc2.weight', 'decoder.0.weight']
[2022-02-05 09:22:26 simmim_pretrain] (optimizer.py 43): INFO AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.05

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0008
    weight_decay: 0.0
)
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 83): INFO number of params: 89874104
[2022-02-05 09:22:26 simmim_pretrain] (utils.py 81): INFO All checkpoints founded in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep: []
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 100): INFO no checkpoint found in output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep, ignoring auto resume
[2022-02-05 09:22:26 simmim_pretrain] (main_simmim.py 105): INFO Start training
[2022-02-05 09:24:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][0/1251]	eta 1 day, 15:53:49 lr 0.000004	time 114.8121 (114.8121)	loss 0.5543 (0.5543)	grad_norm 0.2902 (0.2902)	mem 17192MB
[2022-02-05 09:45:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][100/1251]	eta 4:24:36 lr 0.000010	time 0.3949 (13.7934)	loss 0.4499 (0.4969)	grad_norm 1.0401 (0.2900)	mem 18238MB
[2022-02-05 10:06:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][200/1251]	eta 3:52:30 lr 0.000017	time 75.5072 (13.2732)	loss 0.3752 (0.4565)	grad_norm 2.8639 (1.6425)	mem 18238MB
[2022-02-05 10:28:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][300/1251]	eta 3:27:27 lr 0.000023	time 0.3941 (13.0894)	loss 0.3553 (0.4264)	grad_norm 2.0591 (2.8358)	mem 18238MB
[2022-02-05 10:48:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][400/1251]	eta 3:02:30 lr 0.000029	time 57.4084 (12.8679)	loss 0.3173 (0.4040)	grad_norm 1.1405 (3.6005)	mem 18238MB
[2022-02-05 11:08:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][500/1251]	eta 2:38:59 lr 0.000036	time 0.3942 (12.7019)	loss 0.3129 (0.3879)	grad_norm 4.7302 (4.0156)	mem 18238MB
[2022-02-05 11:29:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][600/1251]	eta 2:17:56 lr 0.000042	time 86.9880 (12.7132)	loss 0.3042 (0.3741)	grad_norm 2.4576 (4.0197)	mem 18238MB
[2022-02-05 11:49:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][700/1251]	eta 1:55:17 lr 0.000048	time 0.3943 (12.5542)	loss 0.2920 (0.3630)	grad_norm 4.6089 (4.0017)	mem 18239MB
[2022-02-05 12:09:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][800/1251]	eta 1:34:10 lr 0.000055	time 73.9639 (12.5290)	loss 0.2979 (0.3536)	grad_norm 3.4510 (3.9055)	mem 18239MB
[2022-02-05 12:29:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][900/1251]	eta 1:13:00 lr 0.000061	time 0.3981 (12.4787)	loss 0.2693 (0.3459)	grad_norm 1.5775 (3.8091)	mem 18239MB
[2022-02-05 12:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1000/1251]	eta 0:52:00 lr 0.000068	time 18.3918 (12.4334)	loss 0.2786 (0.3394)	grad_norm 1.2491 (3.7356)	mem 18239MB
[2022-02-05 13:10:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1100/1251]	eta 0:31:18 lr 0.000074	time 0.4033 (12.4426)	loss 0.2725 (0.3335)	grad_norm 2.2311 (3.6312)	mem 18239MB
[2022-02-05 13:30:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [0/100][1200/1251]	eta 0:10:32 lr 0.000080	time 31.6500 (12.4020)	loss 0.2715 (0.3286)	grad_norm 1.2720 (3.5534)	mem 18239MB
[2022-02-05 13:39:44 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 0 training takes 4:17:18
[2022-02-05 13:39:44 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saving......
[2022-02-05 13:39:46 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_0.pth saved !!!
[2022-02-05 13:39:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][0/1251]	eta 1:01:34 lr 0.000083	time 2.9530 (2.9530)	loss 0.2705 (0.2705)	grad_norm 0.8280 (0.8280)	mem 18239MB
[2022-02-05 13:41:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][100/1251]	eta 0:14:17 lr 0.000090	time 0.6114 (0.7453)	loss 0.2802 (0.2693)	grad_norm 3.6450 (2.3059)	mem 18239MB
[2022-02-05 13:42:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][200/1251]	eta 0:14:40 lr 0.000096	time 0.7879 (0.8375)	loss 0.2727 (0.2691)	grad_norm 2.2279 (2.2994)	mem 18239MB
[2022-02-05 13:44:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][300/1251]	eta 0:13:41 lr 0.000103	time 0.4401 (0.8638)	loss 0.2757 (0.2682)	grad_norm 1.1539 (2.2752)	mem 18239MB
[2022-02-05 13:45:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][400/1251]	eta 0:11:34 lr 0.000109	time 0.4306 (0.8162)	loss 0.2588 (0.2672)	grad_norm 1.2593 (2.2458)	mem 18239MB
[2022-02-05 13:46:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][500/1251]	eta 0:10:38 lr 0.000115	time 0.5900 (0.8503)	loss 0.2552 (0.2668)	grad_norm 1.4727 (2.2056)	mem 18240MB
[2022-02-05 13:47:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][600/1251]	eta 0:08:45 lr 0.000122	time 0.4254 (0.8066)	loss 0.2584 (0.2662)	grad_norm 1.1834 (2.1712)	mem 18240MB
[2022-02-05 13:48:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][700/1251]	eta 0:06:56 lr 0.000128	time 0.4058 (0.7558)	loss 0.2641 (0.2653)	grad_norm 1.1315 (2.1186)	mem 18240MB
[2022-02-05 13:49:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][800/1251]	eta 0:05:41 lr 0.000134	time 0.4352 (0.7570)	loss 0.2742 (0.2649)	grad_norm 0.7488 (2.0964)	mem 18240MB
[2022-02-05 13:51:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][900/1251]	eta 0:04:35 lr 0.000141	time 0.4130 (0.7842)	loss 0.2476 (0.2644)	grad_norm 0.6401 (2.0539)	mem 18240MB
[2022-02-05 13:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1000/1251]	eta 0:03:08 lr 0.000147	time 0.4153 (0.7508)	loss 0.2717 (0.2639)	grad_norm 2.2334 (2.0098)	mem 18240MB
[2022-02-05 13:53:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1100/1251]	eta 0:01:51 lr 0.000154	time 0.4521 (0.7393)	loss 0.2551 (0.2633)	grad_norm 1.4980 (1.9817)	mem 18240MB
[2022-02-05 13:55:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [1/100][1200/1251]	eta 0:00:39 lr 0.000160	time 0.4667 (0.7788)	loss 0.2664 (0.2627)	grad_norm 0.7340 (1.9572)	mem 18240MB
[2022-02-05 13:56:06 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 1 training takes 0:16:20
[2022-02-05 13:56:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][0/1251]	eta 1:16:36 lr 0.000163	time 3.6739 (3.6739)	loss 0.2620 (0.2620)	grad_norm 0.9611 (0.9611)	mem 18240MB
[2022-02-05 13:56:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][100/1251]	eta 0:09:22 lr 0.000169	time 0.4276 (0.4883)	loss 0.2562 (0.2552)	grad_norm 0.5311 (1.6903)	mem 18240MB
[2022-02-05 13:58:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][200/1251]	eta 0:11:20 lr 0.000176	time 0.4207 (0.6473)	loss 0.2618 (0.2542)	grad_norm 0.6081 (1.6235)	mem 18240MB
[2022-02-05 13:59:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][300/1251]	eta 0:09:36 lr 0.000182	time 0.4451 (0.6061)	loss 0.2528 (0.2531)	grad_norm 0.4520 (1.6033)	mem 18240MB
[2022-02-05 14:00:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][400/1251]	eta 0:09:29 lr 0.000189	time 0.4445 (0.6689)	loss 0.2413 (0.2525)	grad_norm 0.6562 (1.5654)	mem 18240MB
[2022-02-05 14:01:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][500/1251]	eta 0:08:08 lr 0.000195	time 2.1151 (0.6503)	loss 0.2539 (0.2520)	grad_norm 1.8790 (1.5394)	mem 18240MB
[2022-02-05 14:03:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][600/1251]	eta 0:07:55 lr 0.000201	time 0.5791 (0.7308)	loss 0.2295 (0.2516)	grad_norm 1.3565 (1.5373)	mem 18240MB
[2022-02-05 14:04:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][700/1251]	eta 0:06:53 lr 0.000208	time 0.4240 (0.7508)	loss 0.2464 (0.2511)	grad_norm 0.5189 (1.5236)	mem 18240MB
[2022-02-05 14:06:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][800/1251]	eta 0:05:48 lr 0.000214	time 2.2608 (0.7731)	loss 0.2481 (0.2507)	grad_norm 0.4695 (1.4909)	mem 18240MB
[2022-02-05 14:08:06 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][900/1251]	eta 0:04:40 lr 0.000220	time 0.4068 (0.7993)	loss 0.2637 (0.2503)	grad_norm 1.5514 (1.4829)	mem 18240MB
[2022-02-05 14:09:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1000/1251]	eta 0:03:23 lr 0.000227	time 1.3535 (0.8115)	loss 0.2443 (0.2498)	grad_norm 0.9744 (1.4653)	mem 18240MB
[2022-02-05 14:11:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1100/1251]	eta 0:02:04 lr 0.000233	time 0.4410 (0.8271)	loss 0.2425 (0.2491)	grad_norm 1.9947 (1.4427)	mem 18240MB
[2022-02-05 14:12:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [2/100][1200/1251]	eta 0:00:42 lr 0.000239	time 0.4275 (0.8318)	loss 0.2465 (0.2486)	grad_norm 0.6265 (1.4366)	mem 18240MB
[2022-02-05 14:13:21 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 2 training takes 0:17:15
[2022-02-05 14:13:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][0/1251]	eta 1:24:40 lr 0.000243	time 4.0614 (4.0614)	loss 0.2433 (0.2433)	grad_norm 0.7067 (0.7067)	mem 18240MB
[2022-02-05 14:14:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][100/1251]	eta 0:09:27 lr 0.000249	time 0.4092 (0.4935)	loss 0.2476 (0.2430)	grad_norm 1.1159 (1.3134)	mem 18240MB
[2022-02-05 14:15:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][200/1251]	eta 0:10:07 lr 0.000255	time 0.3964 (0.5784)	loss 0.2400 (0.2425)	grad_norm 0.3384 (1.2386)	mem 18240MB
[2022-02-05 14:16:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][300/1251]	eta 0:08:53 lr 0.000262	time 0.4966 (0.5605)	loss 0.2404 (0.2416)	grad_norm 0.3401 (1.1964)	mem 18240MB
[2022-02-05 14:17:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][400/1251]	eta 0:09:07 lr 0.000268	time 0.7116 (0.6430)	loss 0.2314 (0.2411)	grad_norm 1.4900 (1.2040)	mem 18240MB
[2022-02-05 14:19:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][500/1251]	eta 0:09:34 lr 0.000275	time 0.4066 (0.7646)	loss 0.2282 (0.2405)	grad_norm 0.5011 (1.2036)	mem 18240MB
[2022-02-05 14:21:30 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][600/1251]	eta 0:08:49 lr 0.000281	time 0.4160 (0.8126)	loss 0.2414 (0.2404)	grad_norm 0.9795 (1.1974)	mem 18240MB
[2022-02-05 14:23:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][700/1251]	eta 0:07:41 lr 0.000287	time 0.4862 (0.8377)	loss 0.2334 (0.2401)	grad_norm 0.4512 (1.1759)	mem 18240MB
[2022-02-05 14:24:49 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][800/1251]	eta 0:06:27 lr 0.000294	time 0.5067 (0.8583)	loss 0.2418 (0.2398)	grad_norm 1.2394 (1.1746)	mem 18240MB
[2022-02-05 14:26:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][900/1251]	eta 0:05:03 lr 0.000300	time 0.4366 (0.8635)	loss 0.2361 (0.2394)	grad_norm 0.5397 (1.1654)	mem 18240MB
[2022-02-05 14:27:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1000/1251]	eta 0:03:37 lr 0.000306	time 1.1073 (0.8658)	loss 0.2352 (0.2390)	grad_norm 0.6021 (1.1541)	mem 18241MB
[2022-02-05 14:29:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1100/1251]	eta 0:02:08 lr 0.000313	time 0.5436 (0.8526)	loss 0.2295 (0.2387)	grad_norm 1.1236 (1.1382)	mem 18241MB
[2022-02-05 14:30:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [3/100][1200/1251]	eta 0:00:44 lr 0.000319	time 0.4830 (0.8667)	loss 0.2486 (0.2385)	grad_norm 0.4466 (1.1277)	mem 18241MB
[2022-02-05 14:31:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 3 training takes 0:18:11
[2022-02-05 14:31:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][0/1251]	eta 1:30:19 lr 0.000322	time 4.3320 (4.3320)	loss 0.2309 (0.2309)	grad_norm 0.3473 (0.3473)	mem 18241MB
[2022-02-05 14:32:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][100/1251]	eta 0:09:42 lr 0.000329	time 0.4199 (0.5059)	loss 0.2313 (0.2347)	grad_norm 0.9780 (0.9537)	mem 18241MB
[2022-02-05 14:33:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][200/1251]	eta 0:10:42 lr 0.000335	time 0.4042 (0.6115)	loss 0.2380 (0.2338)	grad_norm 0.4685 (0.9641)	mem 18241MB
[2022-02-05 14:35:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][300/1251]	eta 0:12:57 lr 0.000341	time 0.4448 (0.8171)	loss 0.2274 (0.2339)	grad_norm 0.5854 (0.9808)	mem 18241MB
[2022-02-05 14:37:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][400/1251]	eta 0:12:15 lr 0.000348	time 0.9284 (0.8642)	loss 0.2300 (0.2342)	grad_norm 0.5273 (0.9884)	mem 18241MB
[2022-02-05 14:38:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][500/1251]	eta 0:10:11 lr 0.000354	time 0.4123 (0.8136)	loss 0.2346 (0.2337)	grad_norm 0.7111 (0.9791)	mem 18241MB
[2022-02-05 14:39:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][600/1251]	eta 0:09:01 lr 0.000361	time 0.4479 (0.8323)	loss 0.2305 (0.2336)	grad_norm 0.7723 (0.9726)	mem 18241MB
[2022-02-05 14:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][700/1251]	eta 0:07:37 lr 0.000367	time 1.2561 (0.8298)	loss 0.2416 (0.2333)	grad_norm 0.7113 (0.9652)	mem 18241MB
[2022-02-05 14:43:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][800/1251]	eta 0:06:29 lr 0.000373	time 2.0054 (0.8637)	loss 0.2229 (0.2332)	grad_norm 0.3053 (0.9582)	mem 18241MB
[2022-02-05 14:44:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][900/1251]	eta 0:05:07 lr 0.000380	time 2.2077 (0.8764)	loss 0.2203 (0.2330)	grad_norm 0.9912 (0.9536)	mem 18241MB
[2022-02-05 14:46:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1000/1251]	eta 0:03:41 lr 0.000386	time 0.4317 (0.8842)	loss 0.2330 (0.2327)	grad_norm 0.4332 (0.9454)	mem 18241MB
[2022-02-05 14:47:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1100/1251]	eta 0:02:09 lr 0.000392	time 1.9930 (0.8594)	loss 0.2376 (0.2325)	grad_norm 0.3494 (0.9425)	mem 18241MB
[2022-02-05 14:49:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [4/100][1200/1251]	eta 0:00:44 lr 0.000399	time 3.0251 (0.8816)	loss 0.2229 (0.2322)	grad_norm 0.3280 (0.9404)	mem 18241MB
[2022-02-05 14:49:56 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 4 training takes 0:18:23
[2022-02-05 14:50:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][0/1251]	eta 1:20:39 lr 0.000402	time 3.8685 (3.8685)	loss 0.2361 (0.2361)	grad_norm 0.2441 (0.2441)	mem 18241MB
[2022-02-05 14:50:45 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][100/1251]	eta 0:09:10 lr 0.000408	time 0.4087 (0.4786)	loss 0.2268 (0.2297)	grad_norm 0.4077 (0.9384)	mem 18241MB
[2022-02-05 14:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][200/1251]	eta 0:11:06 lr 0.000415	time 0.4741 (0.6344)	loss 0.2332 (0.2293)	grad_norm 0.6293 (0.8776)	mem 18241MB
[2022-02-05 14:53:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][300/1251]	eta 0:09:43 lr 0.000421	time 0.4483 (0.6141)	loss 0.4291 (0.2770)	grad_norm 0.1236 (nan)	mem 18241MB
[2022-02-05 14:54:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][400/1251]	eta 0:09:25 lr 0.000427	time 0.4158 (0.6646)	loss 0.4575 (0.3163)	grad_norm 0.7630 (nan)	mem 18241MB
[2022-02-05 14:55:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][500/1251]	eta 0:07:59 lr 0.000434	time 0.4047 (0.6385)	loss 0.4399 (0.3400)	grad_norm 1.2759 (nan)	mem 18241MB
[2022-02-05 14:57:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][600/1251]	eta 0:07:54 lr 0.000440	time 0.3981 (0.7282)	loss 0.5130 (0.3663)	grad_norm 0.0916 (nan)	mem 18241MB
[2022-02-05 14:58:29 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][700/1251]	eta 0:06:42 lr 0.000446	time 0.4456 (0.7310)	loss 0.4537 (0.3812)	grad_norm 0.5641 (nan)	mem 18241MB
[2022-02-05 14:59:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][800/1251]	eta 0:05:14 lr 0.000453	time 0.4507 (0.6973)	loss 0.5098 (0.3919)	grad_norm 0.3081 (nan)	mem 18241MB
[2022-02-05 15:01:09 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][900/1251]	eta 0:04:22 lr 0.000459	time 0.4445 (0.7472)	loss 0.4913 (0.4046)	grad_norm 0.0084 (nan)	mem 18241MB
[2022-02-05 15:03:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1000/1251]	eta 0:03:16 lr 0.000466	time 3.5568 (0.7843)	loss 0.5040 (0.4146)	grad_norm 0.0242 (nan)	mem 18241MB
[2022-02-05 15:04:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1100/1251]	eta 0:01:58 lr 0.000472	time 0.4314 (0.7874)	loss 0.4915 (0.4226)	grad_norm 0.2421 (nan)	mem 18241MB
[2022-02-05 15:05:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [5/100][1200/1251]	eta 0:00:38 lr 0.000478	time 0.4317 (0.7606)	loss 0.5508 (0.4266)	grad_norm 0.0683 (nan)	mem 18241MB
[2022-02-05 15:05:33 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 5 training takes 0:15:37
[2022-02-05 15:05:33 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saving......
[2022-02-05 15:05:36 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_5.pth saved !!!
[2022-02-05 15:05:40 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][0/1251]	eta 1:23:58 lr 0.000481	time 4.0272 (4.0272)	loss 0.4738 (0.4738)	grad_norm 1.3644 (1.3644)	mem 18241MB
[2022-02-05 15:07:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][100/1251]	eta 0:16:57 lr 0.000488	time 0.4414 (0.8841)	loss 0.4598 (0.4497)	grad_norm 0.1361 (nan)	mem 18241MB
[2022-02-05 15:08:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][200/1251]	eta 0:16:22 lr 0.000494	time 0.4016 (0.9352)	loss 0.5048 (0.4722)	grad_norm 0.0122 (nan)	mem 18241MB
[2022-02-05 15:10:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][300/1251]	eta 0:14:59 lr 0.000501	time 1.8795 (0.9461)	loss 0.4727 (0.4830)	grad_norm 0.0052 (nan)	mem 18241MB
[2022-02-05 15:12:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][400/1251]	eta 0:13:44 lr 0.000507	time 1.0088 (0.9684)	loss 0.4677 (0.4865)	grad_norm 0.0896 (nan)	mem 18241MB
[2022-02-05 15:13:35 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][500/1251]	eta 0:11:58 lr 0.000513	time 0.4878 (0.9563)	loss 0.5154 (0.4816)	grad_norm 24.0200 (nan)	mem 18241MB
[2022-02-05 15:15:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][600/1251]	eta 0:10:28 lr 0.000520	time 0.4229 (0.9660)	loss 0.4594 (0.4808)	grad_norm 0.9102 (nan)	mem 18243MB
[2022-02-05 15:16:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][700/1251]	eta 0:08:53 lr 0.000526	time 6.5413 (0.9676)	loss 0.4411 (0.4790)	grad_norm 0.7869 (nan)	mem 18243MB
[2022-02-05 15:18:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][800/1251]	eta 0:07:16 lr 0.000532	time 0.4438 (0.9674)	loss 0.4367 (0.4746)	grad_norm 1.4051 (nan)	mem 18243MB
[2022-02-05 15:20:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][900/1251]	eta 0:05:41 lr 0.000539	time 3.8164 (0.9736)	loss 0.4383 (0.4707)	grad_norm 0.0261 (nan)	mem 18243MB
[2022-02-05 15:21:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1000/1251]	eta 0:04:03 lr 0.000545	time 1.6960 (0.9705)	loss 0.4484 (0.4665)	grad_norm 21.2195 (nan)	mem 18243MB
[2022-02-05 15:23:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1100/1251]	eta 0:02:26 lr 0.000552	time 0.4055 (0.9699)	loss 0.4562 (0.4642)	grad_norm 1.7039 (nan)	mem 18243MB
[2022-02-05 15:24:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [6/100][1200/1251]	eta 0:00:49 lr 0.000558	time 0.6191 (0.9622)	loss 0.4597 (0.4641)	grad_norm 1.6285 (nan)	mem 18243MB
[2022-02-05 15:25:37 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 6 training takes 0:20:01
[2022-02-05 15:25:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][0/1251]	eta 1:24:51 lr 0.000561	time 4.0702 (4.0702)	loss 0.4520 (0.4520)	grad_norm 0.2361 (0.2361)	mem 18243MB
[2022-02-05 15:26:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][100/1251]	eta 0:09:22 lr 0.000567	time 0.4117 (0.4889)	loss 0.4651 (0.4644)	grad_norm 0.0263 (1.5361)	mem 18243MB
[2022-02-05 15:27:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][200/1251]	eta 0:08:14 lr 0.000574	time 0.5824 (0.4702)	loss 0.4427 (0.4608)	grad_norm 0.4953 (2.5894)	mem 18243MB
[2022-02-05 15:27:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][300/1251]	eta 0:07:20 lr 0.000580	time 0.4171 (0.4631)	loss 0.4863 (0.4698)	grad_norm 0.0398 (2.0401)	mem 18243MB
[2022-02-05 15:30:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][400/1251]	eta 0:09:53 lr 0.000587	time 0.4244 (0.6975)	loss 0.4536 (0.4673)	grad_norm 0.3069 (1.8147)	mem 18243MB
[2022-02-05 15:32:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][500/1251]	eta 0:09:52 lr 0.000593	time 0.4264 (0.7884)	loss 0.4325 (0.4624)	grad_norm 0.1347 (2.3873)	mem 18243MB
[2022-02-05 15:33:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][600/1251]	eta 0:08:55 lr 0.000599	time 1.4852 (0.8225)	loss 0.4847 (0.4582)	grad_norm 9.4087 (2.4801)	mem 18243MB
[2022-02-05 15:35:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][700/1251]	eta 0:07:38 lr 0.000606	time 0.4803 (0.8325)	loss 0.4958 (0.4663)	grad_norm 0.0721 (2.2051)	mem 18243MB
[2022-02-05 15:37:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][800/1251]	eta 0:06:24 lr 0.000612	time 0.4407 (0.8519)	loss 0.5036 (0.4695)	grad_norm 0.0279 (2.1894)	mem 18243MB
[2022-02-05 15:38:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][900/1251]	eta 0:05:05 lr 0.000618	time 1.0429 (0.8691)	loss 0.4598 (0.4725)	grad_norm 0.3491 (1.9502)	mem 18243MB
[2022-02-05 15:40:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1000/1251]	eta 0:03:39 lr 0.000625	time 0.4146 (0.8731)	loss 0.4447 (0.4727)	grad_norm 0.0666 (1.8049)	mem 18243MB
[2022-02-05 15:41:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1100/1251]	eta 0:02:13 lr 0.000631	time 0.4089 (0.8846)	loss 0.5773 (0.4706)	grad_norm 533.4438 (2.1825)	mem 18243MB
[2022-02-05 15:43:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [7/100][1200/1251]	eta 0:00:45 lr 0.000637	time 0.4819 (0.8905)	loss 0.4459 (0.4708)	grad_norm 0.2206 (inf)	mem 18243MB
[2022-02-05 15:44:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 7 training takes 0:18:32
[2022-02-05 15:44:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][0/1251]	eta 1:22:31 lr 0.000641	time 3.9582 (3.9582)	loss 0.4379 (0.4379)	grad_norm 0.1034 (0.1034)	mem 18243MB
[2022-02-05 15:44:59 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][100/1251]	eta 0:09:21 lr 0.000647	time 0.4268 (0.4878)	loss 0.4240 (0.4471)	grad_norm 0.1163 (0.5972)	mem 18243MB
[2022-02-05 15:46:32 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][200/1251]	eta 0:12:24 lr 0.000653	time 0.4756 (0.7080)	loss 0.5335 (0.4479)	grad_norm 0.6204 (5.4569)	mem 18243MB
[2022-02-05 15:47:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][300/1251]	eta 0:09:50 lr 0.000660	time 0.4158 (0.6213)	loss 0.5053 (0.4720)	grad_norm 0.0163 (3.8024)	mem 18243MB
[2022-02-05 15:48:02 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][400/1251]	eta 0:08:11 lr 0.000666	time 0.4401 (0.5773)	loss 0.4971 (0.4803)	grad_norm 0.0055 (2.8562)	mem 18243MB
[2022-02-05 15:49:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][500/1251]	eta 0:07:22 lr 0.000673	time 1.8764 (0.5890)	loss 0.5002 (0.4848)	grad_norm 0.0067 (2.2872)	mem 18243MB
[2022-02-05 15:51:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][600/1251]	eta 0:07:31 lr 0.000679	time 0.4090 (0.6942)	loss 0.4947 (0.4882)	grad_norm 0.0027 (1.9076)	mem 18243MB
[2022-02-05 15:52:53 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][700/1251]	eta 0:06:50 lr 0.000685	time 2.9712 (0.7456)	loss 0.5094 (0.4906)	grad_norm 0.0018 (1.6364)	mem 18243MB
[2022-02-05 15:53:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][800/1251]	eta 0:05:29 lr 0.000692	time 0.5231 (0.7305)	loss 0.5050 (0.4927)	grad_norm 0.0023 (1.4328)	mem 18243MB
[2022-02-05 15:55:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][900/1251]	eta 0:04:29 lr 0.000698	time 0.4867 (0.7679)	loss 0.5158 (0.4942)	grad_norm 0.0031 (1.2744)	mem 18243MB
[2022-02-05 15:57:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1000/1251]	eta 0:03:18 lr 0.000704	time 2.5024 (0.7920)	loss 0.5137 (0.4952)	grad_norm 0.0069 (1.1477)	mem 18243MB
[2022-02-05 15:58:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1100/1251]	eta 0:01:56 lr 0.000711	time 0.4010 (0.7713)	loss 0.5179 (0.4962)	grad_norm 0.0027 (1.0440)	mem 18243MB
[2022-02-05 16:00:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [8/100][1200/1251]	eta 0:00:40 lr 0.000717	time 0.3946 (0.7944)	loss 0.5119 (0.4969)	grad_norm 0.0025 (0.9576)	mem 18243MB
[2022-02-05 16:00:52 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 8 training takes 0:16:41
[2022-02-05 16:00:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][0/1251]	eta 1:20:40 lr 0.000720	time 3.8696 (3.8696)	loss 0.4855 (0.4855)	grad_norm 0.0029 (0.0029)	mem 18243MB
[2022-02-05 16:01:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][100/1251]	eta 0:09:21 lr 0.000727	time 0.4418 (0.4877)	loss 0.5036 (0.5047)	grad_norm 0.0077 (0.0063)	mem 18243MB
[2022-02-05 16:02:26 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][200/1251]	eta 0:08:13 lr 0.000733	time 0.4427 (0.4691)	loss 0.5000 (0.5045)	grad_norm 0.0043 (0.0063)	mem 18243MB
[2022-02-05 16:03:12 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][300/1251]	eta 0:07:20 lr 0.000739	time 0.4323 (0.4635)	loss 0.5210 (0.5047)	grad_norm 0.0036 (0.0064)	mem 18243MB
[2022-02-05 16:05:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][400/1251]	eta 0:08:49 lr 0.000746	time 0.4176 (0.6220)	loss 0.4839 (0.5052)	grad_norm 0.0049 (0.0073)	mem 18243MB
[2022-02-05 16:06:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][500/1251]	eta 0:09:07 lr 0.000752	time 0.4086 (0.7284)	loss 0.4946 (0.5054)	grad_norm 0.0034 (0.0072)	mem 18243MB
[2022-02-05 16:08:31 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][600/1251]	eta 0:08:17 lr 0.000759	time 0.4523 (0.7643)	loss 0.5037 (0.5055)	grad_norm 0.0185 (0.0070)	mem 18243MB
[2022-02-05 16:10:10 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][700/1251]	eta 0:07:18 lr 0.000765	time 0.4846 (0.7965)	loss 0.5141 (0.5057)	grad_norm 0.0029 (0.0071)	mem 18243MB
[2022-02-05 16:11:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][800/1251]	eta 0:06:11 lr 0.000771	time 0.5237 (0.8228)	loss 0.4947 (0.5055)	grad_norm 0.0037 (0.0071)	mem 18243MB
[2022-02-05 16:13:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][900/1251]	eta 0:04:50 lr 0.000778	time 0.4529 (0.8269)	loss 0.5303 (0.5055)	grad_norm 0.0031 (0.0073)	mem 18243MB
[2022-02-05 16:14:44 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1000/1251]	eta 0:03:28 lr 0.000784	time 5.7999 (0.8313)	loss 0.5151 (0.5056)	grad_norm 0.0050 (0.0074)	mem 18243MB
[2022-02-05 16:16:19 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1100/1251]	eta 0:02:07 lr 0.000790	time 1.1566 (0.8422)	loss 0.4930 (0.5055)	grad_norm 0.0044 (0.0074)	mem 18243MB
[2022-02-05 16:17:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [9/100][1200/1251]	eta 0:00:43 lr 0.000797	time 0.5183 (0.8531)	loss 0.4922 (0.5056)	grad_norm 0.0028 (0.0076)	mem 18243MB
[2022-02-05 16:18:39 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 9 training takes 0:17:46
[2022-02-05 16:18:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][0/1251]	eta 1:24:03 lr 0.000800	time 4.0314 (4.0314)	loss 0.5028 (0.5028)	grad_norm 0.0046 (0.0046)	mem 18243MB
[2022-02-05 16:19:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][100/1251]	eta 0:09:18 lr 0.000781	time 0.4616 (0.4852)	loss 0.5053 (0.5051)	grad_norm 0.0029 (0.0082)	mem 18243MB
[2022-02-05 16:20:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][200/1251]	eta 0:10:45 lr 0.000781	time 1.2564 (0.6146)	loss 0.5208 (0.5047)	grad_norm 0.0030 (0.0077)	mem 18243MB
[2022-02-05 16:22:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][300/1251]	eta 0:13:19 lr 0.000781	time 0.4131 (0.8408)	loss 0.5163 (0.5054)	grad_norm 0.0067 (0.0078)	mem 18243MB
[2022-02-05 16:24:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][400/1251]	eta 0:12:00 lr 0.000780	time 0.4386 (0.8464)	loss 0.5159 (0.5057)	grad_norm 0.0075 (0.0083)	mem 18243MB
[2022-02-05 16:25:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][500/1251]	eta 0:09:37 lr 0.000780	time 0.4158 (0.7694)	loss 0.5114 (0.5056)	grad_norm 0.0055 (0.0083)	mem 18243MB
[2022-02-05 16:26:21 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][600/1251]	eta 0:08:21 lr 0.000780	time 0.4583 (0.7696)	loss 0.5191 (0.5058)	grad_norm 0.0064 (0.0083)	mem 18243MB
[2022-02-05 16:27:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][700/1251]	eta 0:06:53 lr 0.000779	time 0.4195 (0.7505)	loss 0.4864 (0.5056)	grad_norm 0.0081 (0.0085)	mem 18243MB
[2022-02-05 16:29:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][800/1251]	eta 0:06:11 lr 0.000779	time 1.3727 (0.8233)	loss 0.4949 (0.5058)	grad_norm 0.0031 (0.0089)	mem 18243MB
[2022-02-05 16:31:20 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][900/1251]	eta 0:04:56 lr 0.000779	time 8.4577 (0.8454)	loss 0.5168 (0.5056)	grad_norm 0.0051 (0.0087)	mem 18243MB
[2022-02-05 16:32:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1000/1251]	eta 0:03:30 lr 0.000778	time 0.4084 (0.8387)	loss 0.5202 (0.5056)	grad_norm 0.0031 (0.0088)	mem 18243MB
[2022-02-05 16:34:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1100/1251]	eta 0:02:08 lr 0.000778	time 0.4177 (0.8530)	loss 0.5111 (0.5056)	grad_norm 0.0056 (0.0087)	mem 18243MB
[2022-02-05 16:35:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [10/100][1200/1251]	eta 0:00:44 lr 0.000778	time 0.4621 (0.8640)	loss 0.4990 (0.5055)	grad_norm 0.0050 (0.0088)	mem 18243MB
[2022-02-05 16:36:41 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 10 training takes 0:18:02
[2022-02-05 16:36:41 simmim_pretrain] (utils.py 60): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saving......
[2022-02-05 16:36:44 simmim_pretrain] (utils.py 62): INFO output/simmim_pretrain/simmim_pretrain__swin_base__img192_window6__100ep/ckpt_epoch_10.pth saved !!!
[2022-02-05 16:36:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][0/1251]	eta 1:13:03 lr 0.000778	time 3.5042 (3.5042)	loss 0.5118 (0.5118)	grad_norm 0.0109 (0.0109)	mem 18243MB
[2022-02-05 16:37:48 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][100/1251]	eta 0:12:09 lr 0.000777	time 0.5750 (0.6336)	loss 0.5325 (0.5074)	grad_norm 0.0052 (0.0076)	mem 18243MB
[2022-02-05 16:39:27 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][200/1251]	eta 0:14:14 lr 0.000777	time 0.4904 (0.8130)	loss 0.5064 (0.5059)	grad_norm 0.0036 (0.0104)	mem 18243MB
[2022-02-05 16:41:03 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][300/1251]	eta 0:13:38 lr 0.000777	time 0.5833 (0.8607)	loss 0.4996 (0.5055)	grad_norm 0.0044 (0.0099)	mem 18243MB
[2022-02-05 16:42:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][400/1251]	eta 0:12:40 lr 0.000776	time 1.4644 (0.8932)	loss 0.5054 (0.5057)	grad_norm 0.0049 (0.0093)	mem 18243MB
[2022-02-05 16:43:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][500/1251]	eta 0:10:45 lr 0.000776	time 0.4532 (0.8590)	loss 0.4886 (0.5055)	grad_norm 0.0048 (0.0096)	mem 18243MB
[2022-02-05 16:45:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][600/1251]	eta 0:09:28 lr 0.000776	time 0.4424 (0.8730)	loss 0.5031 (0.5053)	grad_norm 0.0035 (0.0096)	mem 18243MB
[2022-02-05 16:46:58 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][700/1251]	eta 0:08:02 lr 0.000775	time 0.4489 (0.8760)	loss 0.5361 (0.5057)	grad_norm 0.0055 (0.0094)	mem 18243MB
[2022-02-05 16:47:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][800/1251]	eta 0:06:11 lr 0.000775	time 0.4341 (0.8229)	loss 0.4948 (0.5057)	grad_norm 0.0069 (0.0095)	mem 18243MB
[2022-02-05 16:49:23 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][900/1251]	eta 0:04:55 lr 0.000775	time 0.4356 (0.8426)	loss 0.5100 (0.5056)	grad_norm 0.0119 (0.0096)	mem 18243MB
[2022-02-05 16:50:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1000/1251]	eta 0:03:23 lr 0.000774	time 0.6975 (0.8099)	loss 0.5050 (0.5058)	grad_norm 0.0047 (0.0097)	mem 18243MB
[2022-02-05 16:52:04 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1100/1251]	eta 0:02:06 lr 0.000774	time 1.0458 (0.8359)	loss 0.5302 (0.5060)	grad_norm 0.0044 (0.0095)	mem 18243MB
[2022-02-05 16:53:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [11/100][1200/1251]	eta 0:00:43 lr 0.000773	time 0.4221 (0.8547)	loss 0.5000 (0.5061)	grad_norm 0.0074 (0.0096)	mem 18243MB
[2022-02-05 16:54:34 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 11 training takes 0:17:50
[2022-02-05 16:54:38 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][0/1251]	eta 1:16:28 lr 0.000773	time 3.6676 (3.6676)	loss 0.5215 (0.5215)	grad_norm 0.0116 (0.0116)	mem 18243MB
[2022-02-05 16:55:52 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][100/1251]	eta 0:14:50 lr 0.000773	time 0.7810 (0.7737)	loss 0.5164 (0.5068)	grad_norm 0.0069 (0.0073)	mem 18243MB
[2022-02-05 16:57:24 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][200/1251]	eta 0:14:47 lr 0.000773	time 1.7244 (0.8448)	loss 0.5187 (0.5058)	grad_norm 0.0096 (0.0076)	mem 18243MB
[2022-02-05 16:58:54 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][300/1251]	eta 0:13:41 lr 0.000772	time 0.4478 (0.8634)	loss 0.4969 (0.5061)	grad_norm 0.0054 (0.0086)	mem 18243MB
[2022-02-05 17:00:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][400/1251]	eta 0:11:49 lr 0.000772	time 0.6186 (0.8340)	loss 0.5156 (0.5064)	grad_norm 0.0082 (0.0085)	mem 18243MB
[2022-02-05 17:01:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][500/1251]	eta 0:11:01 lr 0.000772	time 0.4288 (0.8806)	loss 0.5079 (0.5062)	grad_norm 0.0048 (0.0085)	mem 18243MB
[2022-02-05 17:03:34 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][600/1251]	eta 0:09:45 lr 0.000771	time 0.4818 (0.8992)	loss 0.4947 (0.5059)	grad_norm 0.0039 (0.0084)	mem 18243MB
[2022-02-05 17:05:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][700/1251]	eta 0:08:20 lr 0.000771	time 0.4087 (0.9092)	loss 0.4817 (0.5060)	grad_norm 0.0042 (0.0083)	mem 18243MB
[2022-02-05 17:06:50 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][800/1251]	eta 0:06:54 lr 0.000770	time 3.8632 (0.9187)	loss 0.5162 (0.5057)	grad_norm 0.0056 (0.0085)	mem 18243MB
[2022-02-05 17:08:22 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][900/1251]	eta 0:05:22 lr 0.000770	time 0.6304 (0.9192)	loss 0.5097 (0.5058)	grad_norm 0.0882 (0.0122)	mem 18243MB
[2022-02-05 17:09:55 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1000/1251]	eta 0:03:50 lr 0.000770	time 0.4901 (0.9200)	loss 0.5016 (0.5059)	grad_norm 0.0458 (0.0284)	mem 18243MB
[2022-02-05 17:11:37 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1100/1251]	eta 0:02:20 lr 0.000769	time 4.5322 (0.9288)	loss 0.4981 (0.5058)	grad_norm 0.0095 (0.0943)	mem 18243MB
[2022-02-05 17:13:08 simmim_pretrain] (main_simmim.py 185): INFO Train: [12/100][1200/1251]	eta 0:00:47 lr 0.000769	time 0.6195 (0.9275)	loss 0.5148 (0.5059)	grad_norm 0.0085 (0.0929)	mem 18243MB
[2022-02-05 17:13:54 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 12 training takes 0:19:19
[2022-02-05 17:13:57 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][0/1251]	eta 1:16:38 lr 0.000769	time 3.6756 (3.6756)	loss 0.5141 (0.5141)	grad_norm 0.0134 (0.0134)	mem 18243MB
[2022-02-05 17:14:42 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][100/1251]	eta 0:09:08 lr 0.000768	time 0.4265 (0.4769)	loss 0.4834 (0.5059)	grad_norm 0.0057 (0.0109)	mem 18243MB
[2022-02-05 17:16:07 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][200/1251]	eta 0:11:38 lr 0.000768	time 0.4268 (0.6645)	loss 0.5068 (0.5065)	grad_norm 0.0053 (0.0105)	mem 18243MB
[2022-02-05 17:17:43 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][300/1251]	eta 0:12:06 lr 0.000768	time 0.5157 (0.7636)	loss 0.4987 (0.5058)	grad_norm 0.0068 (0.0111)	mem 18243MB
[2022-02-05 17:19:11 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][400/1251]	eta 0:11:13 lr 0.000767	time 0.4169 (0.7910)	loss 0.5224 (0.5060)	grad_norm 0.0052 (0.0109)	mem 18243MB
[2022-02-05 17:20:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][500/1251]	eta 0:09:47 lr 0.000767	time 0.5883 (0.7817)	loss 0.4828 (0.5060)	grad_norm 0.0175 (0.0106)	mem 18243MB
[2022-02-05 17:21:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][600/1251]	eta 0:08:12 lr 0.000766	time 3.2032 (0.7566)	loss 0.5117 (0.5061)	grad_norm 0.0134 (0.0104)	mem 18243MB
[2022-02-05 17:23:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][700/1251]	eta 0:07:20 lr 0.000766	time 0.4028 (0.8003)	loss 0.5239 (0.5060)	grad_norm 0.0044 (0.0108)	mem 18243MB
[2022-02-05 17:25:01 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][800/1251]	eta 0:06:15 lr 0.000766	time 0.3985 (0.8326)	loss 0.5014 (0.5062)	grad_norm 0.0114 (0.0110)	mem 18243MB
[2022-02-05 17:26:39 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][900/1251]	eta 0:04:58 lr 0.000765	time 0.6148 (0.8492)	loss 0.5003 (0.5062)	grad_norm 0.0027 (0.0109)	mem 18243MB
[2022-02-05 17:28:16 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1000/1251]	eta 0:03:36 lr 0.000765	time 0.4273 (0.8616)	loss 0.4993 (0.5062)	grad_norm 0.0140 (0.0108)	mem 18243MB
[2022-02-05 17:30:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1100/1251]	eta 0:02:12 lr 0.000764	time 0.4532 (0.8779)	loss 0.5004 (0.5061)	grad_norm 0.0107 (0.0109)	mem 18243MB
[2022-02-05 17:31:28 simmim_pretrain] (main_simmim.py 185): INFO Train: [13/100][1200/1251]	eta 0:00:44 lr 0.000764	time 0.5378 (0.8783)	loss 0.4941 (0.5061)	grad_norm 0.0027 (0.0106)	mem 18243MB
[2022-02-05 17:32:10 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 13 training takes 0:18:16
[2022-02-05 17:32:14 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][0/1251]	eta 1:28:50 lr 0.000764	time 4.2612 (4.2612)	loss 0.4906 (0.4906)	grad_norm 0.0057 (0.0057)	mem 18243MB
[2022-02-05 17:33:00 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][100/1251]	eta 0:09:30 lr 0.000763	time 0.4356 (0.4958)	loss 0.4973 (0.5057)	grad_norm 0.0031 (0.0087)	mem 18243MB
[2022-02-05 17:34:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][200/1251]	eta 0:11:46 lr 0.000763	time 0.4247 (0.6724)	loss 0.5087 (0.5059)	grad_norm 0.0055 (0.0088)	mem 18243MB
[2022-02-05 17:36:18 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][300/1251]	eta 0:13:02 lr 0.000763	time 0.5249 (0.8233)	loss 0.5212 (0.5071)	grad_norm 0.0065 (0.0095)	mem 18243MB
[2022-02-05 17:37:56 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][400/1251]	eta 0:12:15 lr 0.000762	time 0.6521 (0.8638)	loss 0.4970 (0.5065)	grad_norm 0.0145 (0.0094)	mem 18243MB
[2022-02-05 17:39:36 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][500/1251]	eta 0:11:08 lr 0.000762	time 1.7114 (0.8902)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:41:15 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][600/1251]	eta 0:09:49 lr 0.000761	time 3.6215 (0.9061)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:42:51 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][700/1251]	eta 0:08:23 lr 0.000761	time 0.5390 (0.9145)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:44:25 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][800/1251]	eta 0:06:54 lr 0.000761	time 0.8262 (0.9181)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:46:05 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][900/1251]	eta 0:05:25 lr 0.000760	time 0.7292 (0.9266)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:47:41 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1000/1251]	eta 0:03:53 lr 0.000760	time 0.5330 (0.9303)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:49:13 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1100/1251]	eta 0:02:20 lr 0.000759	time 1.6054 (0.9295)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:50:47 simmim_pretrain] (main_simmim.py 185): INFO Train: [14/100][1200/1251]	eta 0:00:47 lr 0.000759	time 0.4148 (0.9302)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:51:29 simmim_pretrain] (main_simmim.py 192): INFO EPOCH 14 training takes 0:19:19
[2022-02-05 17:51:33 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][0/1251]	eta 1:25:26 lr 0.000759	time 4.0980 (4.0980)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:52:17 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][100/1251]	eta 0:09:03 lr 0.000758	time 0.4545 (0.4721)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:53:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][200/1251]	eta 0:11:56 lr 0.000758	time 0.4286 (0.6820)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB
[2022-02-05 17:54:46 simmim_pretrain] (main_simmim.py 185): INFO Train: [15/100][300/1251]	eta 0:10:23 lr 0.000757	time 0.4350 (0.6558)	loss nan (nan)	grad_norm nan (nan)	mem 18243MB

I did not modify any of the configs except for specifying --accumulation-steps 2 from the command line to fit in memory on an 8-GPU machine. I'm using CUDA 11.1, CUDNN 8 and Pytorch 1.9.0 (which is sufficiently new). Could you help take a look what went wrong and how to fix this?

Thank you!

DianCh avatar Feb 07 '22 20:02 DianCh

I encounter the same problem. How do you solve it?

xiaofei05 avatar Jul 20 '22 07:07 xiaofei05