random seed
In your work, I fixed the random seed, but the experimental results still show fluctuations. What could be the reason for this?
Hi! Sorry for the issues when reproducing the work. Actually, a large portion of the experiments are conducted on the Hopper GPU architecture, which employs acceleration features that modifies the floating point calculation process. Here is a list of experiments conducted on the 3090 GPUs:
- All experiments on ViT-Base.
- >50% experiments on ViT-Large.
Sorry again for any inconvenience caused! We are currently working on uploading all model weights needed to replicate the experiments in the article. Please keep an eye on this repository—they will be available very soon (within a few days).
What I mean is that during my reproduction process, I fixed the random seed, yet the results were different each time. Moreover, after removing the token merging code, the issue no longer occurred.
Try ensure that you disable any operations that might introduce non-determinism:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
And it might help :)
The random seed has already been fixed at the beginning of the main function. Below is my random seed code, yet the issue still occurs. if seed is not None: torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.cuda.manual_seed_all(seed) np.random.seed(seed) random.seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False
Did you see any warnings when running the training script? If convenient, would you please provide the training logs?
Namespace(IS_not_position_VPT=False, aa='rand-m9-mstd0.5-inc1', amp=False, batch_size=32, cfg='experiments/LoRA/ViT-B_prompt_lora_12.yaml', change_qkv=False, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=0.0, cutmix_minmax=None, data_path='/data/dataset/liangjunjie/vtab-1k/caltech101', data_set='caltech101', decay_epochs=30, decay_rate=0.1, device='cuda', direct_resize=True, dist_eval=False, dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, drop_rate_LoRA=0.1, drop_rate_adapter=0.1, drop_rate_prompt=0.1, epochs=100, eval=False, few_shot_seed=3072, few_shot_shot=2, gp=False, inat_category='name', inception=False, input_size=224, launcher='none', lr=0.0015, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, lr_power=1.0, max_relative_position=14, merging_schedule='high', min_lr=1e-05, mixup=0.0, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='', model_ema=False, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, no_abs_pos=False, no_aug=True, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='outputs/20250307_092511_ViT-B_prompt_lora_12_compress_high_PYRA_LR_0.0001/caltech101_lr-0.0015_wd-0.0001', patch_size=16, patience_epochs=10, pin_mem=True, platform='pai', post_norm=False, pyra=True, pyra_lr=0.0001, recount=1, relative_position=False, remode='pixel', repeated_aug=False, reprob=0.25, resplit=False, resume='/data/dataset/liangjunjie/liangjunjie/tokenLoRA-main/ViT-B_16-224.npz', rpe_type='bias', save_ckpt=False, sched='cosine', seed=3072, separate_lr_for_pyra=True, smoothing=0.0, start_epoch=0, teacher_model='', test_batch_size=512, token_merging=True, train_interpolation='bicubic', val_interval=10, warmup_epochs=10, warmup_lr=1e-06, weight_decay=0.0001, world_size=1) mixup_active False Creating SuperVisionTransformer {'MODEL_NAME': 'vit_base_patch16_224_in21k', 'DEPTH': 12, 'VISUAL_PROMPT_DIM': 0, 'LORA_DIM': 12, 'ADAPTER_DIM': 0, 'PREFIX_DIM': 0, 'PROMPT_DEPTH': 0} LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 LoRA_dim 12 prefix_dim 0 visual_prompt_dim 0 adapter 0 drop_rate 0.0 attn_drop_rate 0.0 VisionTransformer True PatchEmbed False Conv2d False Identity False Dropout False simam_module True Sigmoid True Sequential True Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False Identity False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False Block False LayerNorm False Attention False Linear False Dropout False Linear False Dropout False Linear True Linear True Dropout True Dropout False DropPath False LayerNorm False Mlp False Linear False GELU False Dropout False Linear False Dropout False Dropout True Adapter False Dropout False LayerNorm False LayerNorm True Identity True Linear True Dropout True Token merging initialization. cls_token False pos_embed False patch_embed.proj.weight False patch_embed.proj.bias False blocks.0.pyra_a True blocks.0.pyra_b True blocks.0.norm1.weight False blocks.0.norm1.bias False blocks.0.attn.qkv.weight False blocks.0.attn.qkv.bias False blocks.0.attn.proj.weight False blocks.0.attn.proj.bias False blocks.0.attn.LoRA_a.weight True blocks.0.attn.LoRA_b.weight True blocks.0.norm2.weight False blocks.0.norm2.bias False blocks.0.mlp.fc1.weight False blocks.0.mlp.fc1.bias False blocks.0.mlp.fc2.weight False blocks.0.mlp.fc2.bias False blocks.0.pyra_norm.weight True blocks.0.pyra_norm.bias True blocks.1.pyra_a True blocks.1.pyra_b True blocks.1.norm1.weight False blocks.1.norm1.bias False blocks.1.attn.qkv.weight False blocks.1.attn.qkv.bias False blocks.1.attn.proj.weight False blocks.1.attn.proj.bias False blocks.1.attn.LoRA_a.weight True blocks.1.attn.LoRA_b.weight True blocks.1.norm2.weight False blocks.1.norm2.bias False blocks.1.mlp.fc1.weight False blocks.1.mlp.fc1.bias False blocks.1.mlp.fc2.weight False blocks.1.mlp.fc2.bias False blocks.1.pyra_norm.weight True blocks.1.pyra_norm.bias True blocks.2.pyra_a True blocks.2.pyra_b True blocks.2.norm1.weight False blocks.2.norm1.bias False blocks.2.attn.qkv.weight False blocks.2.attn.qkv.bias False blocks.2.attn.proj.weight False blocks.2.attn.proj.bias False blocks.2.attn.LoRA_a.weight True blocks.2.attn.LoRA_b.weight True blocks.2.norm2.weight False blocks.2.norm2.bias False blocks.2.mlp.fc1.weight False blocks.2.mlp.fc1.bias False blocks.2.mlp.fc2.weight /home/liangjunjie/anaconda3/envs/PYRA/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( /home/liangjunjie/anaconda3/envs/PYRA/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum. warnings.warn( /data/dataset/liangjunjie/liangjunjie/PYRA-main/model/adaptive_merge.py:196: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/TensorAdvancedIndexing.cpp:1550.) dst = dst.scatter_reduce(-2, dst_idx.expand(n, r, c), src, reduce=mode) /data/dataset/liangjunjie/liangjunjie/PYRA-main/model/tome.py:194: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). .reshape(B, N, 3, self.num_heads, C // self.num_heads) /data/dataset/liangjunjie/liangjunjie/PYRA-main/model/tome.py:200: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). qkv_delta = self.LoRA_b(qkv_delta).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) /data/dataset/liangjunjie/liangjunjie/PYRA-main/model/adaptive_merge.py:155: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). r = min(r, (t - protected) // 2) Unsupported operator aten::mul encountered 75 time(s) Unsupported operator aten::sub encountered 38 time(s) Unsupported operator aten::mean encountered 13 time(s) Unsupported operator aten::pow encountered 1 time(s) Unsupported operator aten::sum encountered 1 time(s) Unsupported operator aten::div encountered 50 time(s) Unsupported operator aten::add encountered 98 time(s) Unsupported operator aten::sigmoid encountered 25 time(s) Unsupported operator aten::softmax encountered 12 time(s) Unsupported operator aten::frobenius_norm encountered 12 time(s) Unsupported operator aten::fill_ encountered 12 time(s) Unsupported operator aten::argsort encountered 12 time(s) Unsupported operator aten::sort encountered 12 time(s) Unsupported operator aten::ones_like encountered 1 time(s) Unsupported operator aten::scatter_reduce encountered 24 time(s) Unsupported operator aten::gelu encountered 12 time(s) Unsupported operator aten::log encountered 11 time(s) The following submodules of the model were never called during the trace of the graph. They may be unused, or they were accessed by direct calls to .forward() or via other python methods. In the latter case they will have zeros for statistics, though their statistics will still contribute to their parent calling module. blocks.0.adapter, blocks.0.adapter.dropout, blocks.0.attn.prefix_drop, blocks.0.drop_prompt, blocks.1.adapter, blocks.1.adapter.dropout, blocks.1.attn.prefix_drop, blocks.1.drop_path, blocks.1.drop_prompt, blocks.10.adapter, blocks.10.adapter.dropout, blocks.10.attn.prefix_drop, blocks.10.drop_path, blocks.10.drop_prompt, blocks.11.adapter, blocks.11.adapter.dropout, blocks.11.attn.prefix_drop, blocks.11.drop_path, blocks.11.drop_prompt, blocks.2.adapter, blocks.2.adapter.dropout, blocks.2.attn.prefix_drop, blocks.2.drop_path, blocks.2.drop_prompt, blocks.3.adapter, blocks.3.adapter.dropout, blocks.3.attn.prefix_drop, blocks.3.drop_path, blocks.3.drop_prompt, blocks.4.adapter, blocks.4.adapter.dropout, blocks.4.attn.prefix_drop, blocks.4.drop_path, blocks.4.drop_prompt, blocks.5.adapter, blocks.5.adapter.dropout, blocks.5.attn.prefix_drop, blocks.5.drop_path, blocks.5.drop_prompt, blocks.6.adapter, blocks.6.adapter.dropout, blocks.6.attn.prefix_drop, blocks.6.drop_path, blocks.6.drop_prompt, blocks.7.adapter, blocks.7.adapter.dropout, blocks.7.attn.prefix_drop, blocks.7.drop_path, blocks.7.drop_prompt, blocks.8.adapter, blocks.8.adapter.dropout, blocks.8.attn.prefix_drop, blocks.8.drop_path, blocks.8.drop_prompt, blocks.9.adapter, blocks.9.adapter.dropout, blocks.9.attn.prefix_drop, blocks.9.drop_path, blocks.9.drop_prompt, drop_prompt False blocks.2.mlp.fc2.bias False blocks.2.pyra_norm.weight True blocks.2.pyra_norm.bias True blocks.3.pyra_a True blocks.3.pyra_b True blocks.3.norm1.weight False blocks.3.norm1.bias False blocks.3.attn.qkv.weight False blocks.3.attn.qkv.bias False blocks.3.attn.proj.weight False blocks.3.attn.proj.bias False blocks.3.attn.LoRA_a.weight True blocks.3.attn.LoRA_b.weight True blocks.3.norm2.weight False blocks.3.norm2.bias False blocks.3.mlp.fc1.weight False blocks.3.mlp.fc1.bias False blocks.3.mlp.fc2.weight False blocks.3.mlp.fc2.bias False blocks.3.pyra_norm.weight True blocks.3.pyra_norm.bias True blocks.4.pyra_a True blocks.4.pyra_b True blocks.4.norm1.weight False blocks.4.norm1.bias False blocks.4.attn.qkv.weight False blocks.4.attn.qkv.bias False blocks.4.attn.proj.weight False blocks.4.attn.proj.bias False blocks.4.attn.LoRA_a.weight True blocks.4.attn.LoRA_b.weight True blocks.4.norm2.weight False blocks.4.norm2.bias False blocks.4.mlp.fc1.weight False blocks.4.mlp.fc1.bias False blocks.4.mlp.fc2.weight False blocks.4.mlp.fc2.bias False blocks.4.pyra_norm.weight True blocks.4.pyra_norm.bias True blocks.5.pyra_a True blocks.5.pyra_b True blocks.5.norm1.weight False blocks.5.norm1.bias False blocks.5.attn.qkv.weight False blocks.5.attn.qkv.bias False blocks.5.attn.proj.weight False blocks.5.attn.proj.bias False blocks.5.attn.LoRA_a.weight True blocks.5.attn.LoRA_b.weight True blocks.5.norm2.weight False blocks.5.norm2.bias False blocks.5.mlp.fc1.weight False blocks.5.mlp.fc1.bias False blocks.5.mlp.fc2.weight False blocks.5.mlp.fc2.bias False blocks.5.pyra_norm.weight True blocks.5.pyra_norm.bias True blocks.6.pyra_a True blocks.6.pyra_b True blocks.6.norm1.weight False blocks.6.norm1.bias False blocks.6.attn.qkv.weight False blocks.6.attn.qkv.bias False blocks.6.attn.proj.weight False blocks.6.attn.proj.bias False blocks.6.attn.LoRA_a.weight True blocks.6.attn.LoRA_b.weight True blocks.6.norm2.weight False blocks.6.norm2.bias False blocks.6.mlp.fc1.weight False blocks.6.mlp.fc1.bias False blocks.6.mlp.fc2.weight False blocks.6.mlp.fc2.bias False blocks.6.pyra_norm.weight True blocks.6.pyra_norm.bias True blocks.7.pyra_a True blocks.7.pyra_b True blocks.7.norm1.weight False blocks.7.norm1.bias False blocks.7.attn.qkv.weight False blocks.7.attn.qkv.bias False blocks.7.attn.proj.weight False blocks.7.attn.proj.bias False blocks.7.attn.LoRA_a.weight True blocks.7.attn.LoRA_b.weight True blocks.7.norm2.weight False blocks.7.norm2.bias False blocks.7.mlp.fc1.weight False blocks.7.mlp.fc1.bias False blocks.7.mlp.fc2.weight False blocks.7.mlp.fc2.bias False blocks.7.pyra_norm.weight True blocks.7.pyra_norm.bias True blocks.8.pyra_a True blocks.8.pyra_b True blocks.8.norm1.weight False blocks.8.norm1.bias False blocks.8.attn.qkv.weight False blocks.8.attn.qkv.bias False blocks.8.attn.proj.weight False blocks.8.attn.proj.bias False blocks.8.attn.LoRA_a.weight True blocks.8.attn.LoRA_b.weight True blocks.8.norm2.weight False blocks.8.norm2.bias False blocks.8.mlp.fc1.weight False blocks.8.mlp.fc1.bias False blocks.8.mlp.fc2.weight False blocks.8.mlp.fc2.bias False blocks.8.pyra_norm.weight True blocks.8.pyra_norm.bias True blocks.9.pyra_a True blocks.9.pyra_b True blocks.9.norm1.weight False blocks.9.norm1.bias False blocks.9.attn.qkv.weight False blocks.9.attn.qkv.bias False blocks.9.attn.proj.weight False blocks.9.attn.proj.bias False blocks.9.attn.LoRA_a.weight True blocks.9.attn.LoRA_b.weight True blocks.9.norm2.weight False blocks.9.norm2.bias False blocks.9.mlp.fc1.weight False blocks.9.mlp.fc1.bias False blocks.9.mlp.fc2.weight False blocks.9.mlp.fc2.bias False blocks.9.pyra_norm.weight True blocks.9.pyra_norm.bias True blocks.10.pyra_a True blocks.10.pyra_b True blocks.10.norm1.weight False blocks.10.norm1.bias False blocks.10.attn.qkv.weight False blocks.10.attn.qkv.bias False blocks.10.attn.proj.weight False blocks.10.attn.proj.bias False blocks.10.attn.LoRA_a.weight True blocks.10.attn.LoRA_b.weight True blocks.10.norm2.weight False blocks.10.norm2.bias False blocks.10.mlp.fc1.weight False blocks.10.mlp.fc1.bias False blocks.10.mlp.fc2.weight False blocks.10.mlp.fc2.bias False blocks.10.pyra_norm.weight True blocks.10.pyra_norm.bias True blocks.11.pyra_a True blocks.11.pyra_b True blocks.11.norm1.weight False blocks.11.norm1.bias False blocks.11.attn.qkv.weight False blocks.11.attn.qkv.bias False blocks.11.attn.proj.weight False blocks.11.attn.proj.bias False blocks.11.attn.LoRA_a.weight True blocks.11.attn.LoRA_b.weight True blocks.11.norm2.weight False blocks.11.norm2.bias False blocks.11.mlp.fc1.weight False blocks.11.mlp.fc1.bias False blocks.11.mlp.fc2.weight False blocks.11.mlp.fc2.bias False blocks.11.pyra_norm.weight True blocks.11.pyra_norm.bias True norm.weight False norm.bias False head.weight True head.bias True total training parameters: 548646 adapter 0 LoRA 442368 prompt 0 prefix 0 PYRA 27840 head 78438 total parameters in model: 86347302 Test model latency (images per second) Model latency: 2237.085784754639 imgs/s Count model FLOPS (GFLOPS) Total FLOPS:5109099520, GFLOPS:4.7582 {'': Counter({'linear': 4835713536, 'matmul': 151563520, 'conv': 115605504, 'layer_norm': 6216960}), 'patch_embed': Counter({'conv': 115605504}), 'patch_embed.proj': Counter({'conv': 115605504}), 'patch_embed.norm': Counter(), 'pos_drop': Counter(), 'simam': Counter(), 'simam.activaton': Counter(), 'blocks': Counter({'linear': 4835635200, 'matmul': 151563520, 'layer_norm': 6197760}), 'blocks.0': Counter({'linear': 1212862464, 'matmul': 60292992, 'layer_norm': 1512960}), 'blocks.0.norm1': Counter({'layer_norm': 756480}), 'blocks.0.attn': Counter({'linear': 472043520, 'matmul': 59610624}), 'blocks.0.attn.qkv': Counter({'linear': 348585984}), 'blocks.0.attn.attn_drop': Counter(), 'blocks.0.attn.proj': Counter({'linear': 116195328}), 'blocks.0.attn.proj_drop': Counter(), 'blocks.0.attn.LoRA_a': Counter({'linear': 1815552}), 'blocks.0.attn.LoRA_b': Counter({'linear': 5446656}), 'blocks.0.attn.LoRA_drop': Counter(), 'blocks.0.attn.prefix_drop': Counter(), 'blocks.0.drop_path': Counter(), 'blocks.0.norm2': Counter({'layer_norm': 602880}), 'blocks.0.mlp': Counter({'linear': 740818944}), 'blocks.0.mlp.fc1': Counter({'linear': 370409472}), 'blocks.0.mlp.act': Counter(), 'blocks.0.mlp.drop1': Counter(), 'blocks.0.mlp.fc2': Counter({'linear': 370409472}), 'blocks.0.mlp.drop2': Counter(), 'blocks.0.drop_prompt': Counter(), 'blocks.0.adapter': Counter(), 'blocks.0.adapter.dropout': Counter(), 'blocks.0.pyra_norm': Counter({'layer_norm': 153600}), 'blocks.1': Counter({'linear': 956583936, 'matmul': 38307456, 'layer_norm': 1205760}), 'blocks.1.norm1': Counter({'layer_norm': 602880}), 'blocks.1.attn': Counter({'linear': 376197120, 'matmul': 37860864}), 'blocks.1.attn.qkv': Counter({'linear': 277807104}), 'blocks.1.attn.attn_drop': Counter(), 'blocks.1.attn.proj': Counter({'linear': 92602368}), 'blocks.1.attn.proj_drop': Counter(), 'blocks.1.attn.LoRA_a': Counter({'linear': 1446912}), 'blocks.1.attn.LoRA_b': Counter({'linear': 4340736}), 'blocks.1.attn.LoRA_drop': Counter(), 'blocks.1.attn.prefix_drop': Counter(), 'blocks.1.drop_path': Counter(), 'blocks.1.norm2': Counter({'layer_norm': 472320}), 'blocks.1.mlp': Counter({'linear': 580386816}), 'blocks.1.mlp.fc1': Counter({'linear': 290193408}), 'blocks.1.mlp.act': Counter(), 'blocks.1.mlp.drop1': Counter(), 'blocks.1.mlp.fc2': Counter({'linear': 290193408}), 'blocks.1.mlp.drop2': Counter(), 'blocks.1.drop_prompt': Counter(), 'blocks.1.adapter': Counter(), 'blocks.1.adapter.dropout': Counter(), 'blocks.1.pyra_norm': Counter({'layer_norm': 130560}), 'blocks.2': Counter({'linear': 733556736, 'matmul': 23526272, 'layer_norm': 944640}), 'blocks.2.norm1': Counter({'layer_norm': 472320}), 'blocks.2.attn': Counter({'linear': 294727680, 'matmul': 23238144}), 'blocks.2.attn.qkv': Counter({'linear': 217645056}), 'blocks.2.attn.attn_drop': Counter(), 'blocks.2.attn.proj': Counter({'linear': 72548352}), 'blocks.2.attn.proj_drop': Counter(), 'blocks.2.attn.LoRA_a': Counter({'linear': 1133568}), 'blocks.2.attn.LoRA_b': Counter({'linear': 3400704}), 'blocks.2.attn.LoRA_drop': Counter(), 'blocks.2.attn.prefix_drop': Counter(), 'blocks.2.drop_path': Counter(), 'blocks.2.norm2': Counter({'layer_norm': 357120}), 'blocks.2.mlp': Counter({'linear': 438829056}), 'blocks.2.mlp.fc1': Counter({'linear': 219414528}), 'blocks.2.mlp.act': Counter(), 'blocks.2.mlp.drop1': Counter(), 'blocks.2.mlp.fc2': Counter({'linear': 219414528}), 'blocks.2.mlp.drop2': Counter(), 'blocks.2.drop_prompt': Counter(), 'blocks.2.adapter': Counter(), 'blocks.2.adapter.dropout': Counter(), 'blocks.2.pyra_norm': Counter({'layer_norm': 115200}), 'blocks.3': Counter({'linear': 548425728, 'matmul': 13460096, 'layer_norm': 714240}), 'blocks.3.norm1': Counter({'layer_norm': 357120}), 'blocks.3.attn': Counter({'linear': 222842880, 'matmul': 13284864}), 'blocks.3.attn.qkv': Counter({'linear': 164560896}), 'blocks.3.attn.attn_drop': Counter(), 'blocks.3.attn.proj': Counter({'linear': 54853632}), 'blocks.3.attn.proj_drop': Counter(), 'blocks.3.attn.LoRA_a': Counter({'linear': 857088}), 'blocks.3.attn.LoRA_b': Counter({'linear': 2571264}), 'blocks.3.attn.LoRA_drop': Counter(), 'blocks.3.attn.prefix_drop': Counter(), 'blocks.3.drop_path': Counter(), 'blocks.3.norm2': Counter({'layer_norm': 264960}), 'blocks.3.mlp': Counter({'linear': 325582848}), 'blocks.3.mlp.fc1': Counter({'linear': 162791424}), 'blocks.3.mlp.act': Counter(), 'blocks.3.mlp.drop1': Counter(), 'blocks.3.mlp.fc2': Counter({'linear': 162791424}), 'blocks.3.mlp.drop2': Counter(), 'blocks.3.drop_prompt': Counter(), 'blocks.3.adapter': Counter(), 'blocks.3.adapter.dropout': Counter(), 'blocks.3.pyra_norm': Counter({'layer_norm': 92160}), 'blocks.4': Counter({'linear': 405983232, 'matmul': 7416704, 'layer_norm': 529920}), 'blocks.4.norm1': Counter({'layer_norm': 264960}), 'blocks.4.attn': Counter({'linear': 165335040, 'matmul': 7312896}), 'blocks.4.attn.qkv': Counter({'linear': 122093568}), 'blocks.4.attn.attn_drop': Counter(), 'blocks.4.attn.proj': Counter({'linear': 40697856}), 'blocks.4.attn.proj_drop': Counter(), 'blocks.4.attn.LoRA_a': Counter({'linear': 635904}), 'blocks.4.attn.LoRA_b': Counter({'linear': 1907712}), 'blocks.4.attn.LoRA_drop': Counter(), 'blocks.4.attn.prefix_drop': Counter(), 'blocks.4.drop_path': Counter(), 'blocks.4.norm2': Counter({'layer_norm': 195840}), 'blocks.4.mlp': Counter({'linear': 240648192}), 'blocks.4.mlp.fc1': Counter({'linear': 120324096}), 'blocks.4.mlp.act': Counter(), 'blocks.4.mlp.drop1': Counter(), 'blocks.4.mlp.fc2': Counter({'linear': 120324096}), 'blocks.4.mlp.drop2': Counter(), 'blocks.4.drop_prompt': Counter(), 'blocks.4.adapter': Counter(), 'blocks.4.adapter.dropout': Counter(), 'blocks.4.pyra_norm': Counter({'layer_norm': 69120}), 'blocks.5': Counter({'linear': 296792064, 'matmul': 4058240, 'layer_norm': 391680}), 'blocks.5.norm1': Counter({'layer_norm': 195840}), 'blocks.5.attn': Counter({'linear': 122204160, 'matmul': 3995136}), 'blocks.5.attn.qkv': Counter({'linear': 90243072}), 'blocks.5.attn.attn_drop': Counter(), 'blocks.5.attn.proj': Counter({'linear': 30081024}), 'blocks.5.attn.proj_drop': Counter(), 'blocks.5.attn.LoRA_a': Counter({'linear': 470016}), 'blocks.5.attn.LoRA_b': Counter({'linear': 1410048}), 'blocks.5.attn.LoRA_drop': Counter(), 'blocks.5.attn.prefix_drop': Counter(), 'blocks.5.drop_path': Counter(), 'blocks.5.norm2': Counter({'layer_norm': 142080}), 'blocks.5.mlp': Counter({'linear': 174587904}), 'blocks.5.mlp.fc1': Counter({'linear': 87293952}), 'blocks.5.mlp.act': Counter(), 'blocks.5.mlp.drop1': Counter(), 'blocks.5.mlp.fc2': Counter({'linear': 87293952}), 'blocks.5.mlp.drop2': Counter(), 'blocks.5.drop_prompt': Counter(), 'blocks.5.adapter': Counter(), 'blocks.5.adapter.dropout': Counter(), 'blocks.5.pyra_norm': Counter({'layer_norm': 53760}), 'blocks.6': Counter({'linear': 216059904, 'matmul': 2140032, 'layer_norm': 284160}), 'blocks.6.norm1': Counter({'layer_norm': 142080}), 'blocks.6.attn': Counter({'linear': 88657920, 'matmul': 2102784}), 'blocks.6.attn.qkv': Counter({'linear': 65470464}), 'blocks.6.attn.attn_drop': Counter(), 'blocks.6.attn.proj': Counter({'linear': 21823488}), 'blocks.6.attn.proj_drop': Counter(), 'blocks.6.attn.LoRA_a': Counter({'linear': 340992}), 'blocks.6.attn.LoRA_b': Counter({'linear': 1022976}), 'blocks.6.attn.LoRA_drop': Counter(), 'blocks.6.attn.prefix_drop': Counter(), 'blocks.6.drop_path': Counter(), 'blocks.6.norm2': Counter({'layer_norm': 103680}), 'blocks.6.mlp': Counter({'linear': 127401984}), 'blocks.6.mlp.fc1': Counter({'linear': 63700992}), 'blocks.6.mlp.act': Counter(), 'blocks.6.mlp.drop1': Counter(), 'blocks.6.mlp.fc2': Counter({'linear': 63700992}), 'blocks.6.mlp.drop2': Counter(), 'blocks.6.drop_prompt': Counter(), 'blocks.6.adapter': Counter(), 'blocks.6.adapter.dropout': Counter(), 'blocks.6.pyra_norm': Counter({'layer_norm': 38400}), 'blocks.7': Counter({'linear': 154349568, 'matmul': 1143680, 'layer_norm': 207360}), 'blocks.7.norm1': Counter({'layer_norm': 103680}), 'blocks.7.attn': Counter({'linear': 64696320, 'matmul': 1119744}), 'blocks.7.attn.qkv': Counter({'linear': 47775744}), 'blocks.7.attn.attn_drop': Counter(), 'blocks.7.attn.proj': Counter({'linear': 15925248}), 'blocks.7.attn.proj_drop': Counter(), 'blocks.7.attn.LoRA_a': Counter({'linear': 248832}), 'blocks.7.attn.LoRA_b': Counter({'linear': 746496}), 'blocks.7.attn.LoRA_drop': Counter(), 'blocks.7.attn.prefix_drop': Counter(), 'blocks.7.drop_path': Counter(), 'blocks.7.norm2': Counter({'layer_norm': 72960}), 'blocks.7.mlp': Counter({'linear': 89653248}), 'blocks.7.mlp.fc1': Counter({'linear': 44826624}), 'blocks.7.mlp.act': Counter(), 'blocks.7.mlp.drop1': Counter(), 'blocks.7.mlp.fc2': Counter({'linear': 44826624}), 'blocks.7.mlp.drop2': Counter(), 'blocks.7.drop_prompt': Counter(), 'blocks.7.adapter': Counter(), 'blocks.7.adapter.dropout': Counter(), 'blocks.7.pyra_norm': Counter({'layer_norm': 30720}), 'blocks.8': Counter({'linear': 116305920, 'matmul': 566400, 'layer_norm': 145920}), 'blocks.8.norm1': Counter({'layer_norm': 72960}), 'blocks.8.attn': Counter({'linear': 45527040, 'matmul': 554496}), 'blocks.8.attn.qkv': Counter({'linear': 33619968}), 'blocks.8.attn.attn_drop': Counter(), 'blocks.8.attn.proj': Counter({'linear': 11206656}), 'blocks.8.attn.proj_drop': Counter(), 'blocks.8.attn.LoRA_a': Counter({'linear': 175104}), 'blocks.8.attn.LoRA_b': Counter({'linear': 525312}), 'blocks.8.attn.LoRA_drop': Counter(), 'blocks.8.attn.prefix_drop': Counter(), 'blocks.8.drop_path': Counter(), 'blocks.8.norm2': Counter({'layer_norm': 57600}), 'blocks.8.mlp': Counter({'linear': 70778880}), 'blocks.8.mlp.fc1': Counter({'linear': 35389440}), 'blocks.8.mlp.act': Counter(), 'blocks.8.mlp.drop1': Counter(), 'blocks.8.mlp.fc2': Counter({'linear': 35389440}), 'blocks.8.mlp.drop2': Counter(), 'blocks.8.drop_prompt': Counter(), 'blocks.8.adapter': Counter(), 'blocks.8.adapter.dropout': Counter(), 'blocks.8.pyra_norm': Counter({'layer_norm': 15360}), 'blocks.9': Counter({'linear': 87846912, 'matmul': 355328, 'layer_norm': 115200}), 'blocks.9.norm1': Counter({'layer_norm': 57600}), 'blocks.9.attn': Counter({'linear': 35942400, 'matmul': 345600}), 'blocks.9.attn.qkv': Counter({'linear': 26542080}), 'blocks.9.attn.attn_drop': Counter(), 'blocks.9.attn.proj': Counter({'linear': 8847360}), 'blocks.9.attn.proj_drop': Counter(), 'blocks.9.attn.LoRA_a': Counter({'linear': 138240}), 'blocks.9.attn.LoRA_b': Counter({'linear': 414720}), 'blocks.9.attn.LoRA_drop': Counter(), 'blocks.9.attn.prefix_drop': Counter(), 'blocks.9.drop_path': Counter(), 'blocks.9.norm2': Counter({'layer_norm': 42240}), 'blocks.9.mlp': Counter({'linear': 51904512}), 'blocks.9.mlp.fc1': Counter({'linear': 25952256}), 'blocks.9.mlp.act': Counter(), 'blocks.9.mlp.drop1': Counter(), 'blocks.9.mlp.fc2': Counter({'linear': 25952256}), 'blocks.9.mlp.drop2': Counter(), 'blocks.9.drop_prompt': Counter(), 'blocks.9.adapter': Counter(), 'blocks.9.adapter.dropout': Counter(), 'blocks.9.pyra_norm': Counter({'layer_norm': 15360}), 'blocks.10': Counter({'linear': 64106496, 'matmul': 192384, 'layer_norm': 84480}), 'blocks.10.norm1': Counter({'layer_norm': 42240}), 'blocks.10.attn': Counter({'linear': 26357760, 'matmul': 185856}), 'blocks.10.attn.qkv': Counter({'linear': 19464192}), 'blocks.10.attn.attn_drop': Counter(), 'blocks.10.attn.proj': Counter({'linear': 6488064}), 'blocks.10.attn.proj_drop': Counter(), 'blocks.10.attn.LoRA_a': Counter({'linear': 101376}), 'blocks.10.attn.LoRA_b': Counter({'linear': 304128}), 'blocks.10.attn.LoRA_drop': Counter(), 'blocks.10.attn.prefix_drop': Counter(), 'blocks.10.drop_path': Counter(), 'blocks.10.norm2': Counter({'layer_norm': 30720}), 'blocks.10.mlp': Counter({'linear': 37748736}), 'blocks.10.mlp.fc1': Counter({'linear': 18874368}), 'blocks.10.mlp.act': Counter(), 'blocks.10.mlp.drop1': Counter(), 'blocks.10.mlp.fc2': Counter({'linear': 18874368}), 'blocks.10.mlp.drop2': Counter(), 'blocks.10.drop_prompt': Counter(), 'blocks.10.adapter': Counter(), 'blocks.10.adapter.dropout': Counter(), 'blocks.10.pyra_norm': Counter({'layer_norm': 11520}), 'blocks.11': Counter({'linear': 42762240, 'matmul': 103936, 'layer_norm': 61440}), 'blocks.11.norm1': Counter({'layer_norm': 30720}), 'blocks.11.attn': Counter({'linear': 19169280, 'matmul': 98304}), 'blocks.11.attn.qkv': Counter({'linear': 14155776}), 'blocks.11.attn.attn_drop': Counter(), 'blocks.11.attn.proj': Counter({'linear': 4718592}), 'blocks.11.attn.proj_drop': Counter(), 'blocks.11.attn.LoRA_a': Counter({'linear': 73728}), 'blocks.11.attn.LoRA_b': Counter({'linear': 221184}), 'blocks.11.attn.LoRA_drop': Counter(), 'blocks.11.attn.prefix_drop': Counter(), 'blocks.11.drop_path': Counter(), 'blocks.11.norm2': Counter({'layer_norm': 19200}), 'blocks.11.mlp': Counter({'linear': 23592960}), 'blocks.11.mlp.fc1': Counter({'linear': 11796480}), 'blocks.11.mlp.act': Counter(), 'blocks.11.mlp.drop1': Counter(), 'blocks.11.mlp.fc2': Counter({'linear': 11796480}), 'blocks.11.mlp.drop2': Counter(), 'blocks.11.drop_prompt': Counter(), 'blocks.11.adapter': Counter(), 'blocks.11.adapter.dropout': Counter(), 'blocks.11.pyra_norm': Counter({'layer_norm': 11520}), 'norm': Counter({'layer_norm': 19200}), 'pre_logits': Counter(), 'head': Counter({'linear': 78336}), 'drop_prompt': Counter()} number of params: 548646 Start training Epoch: [0] [ 0/31] eta: 0:02:26 lr: 0.000001 loss: 4.6339 (4.6339) time: 4.7249 data: 4.4059 max mem: 5990 Epoch: [0] [10/31] eta: 0:00:10 lr: 0.000001 loss: 4.6339 (4.6325) time: 0.4892 data: 0.4007 max mem: 5990 Epoch: [0] [20/31] eta: 0:00:03 lr: 0.000001 loss: 4.6296 (4.6302) time: 0.0676 data: 0.0046 max mem: 5990 Epoch: [0] [30/31] eta: 0:00:00 lr: 0.000001 loss: 4.6296 (4.6323) time: 0.0646 data: 0.0046 max mem: 5990 Epoch: [0] Total time: 0:00:07 (0.2279 s / it) Averaged stats: lr: 0.000001 loss: 4.6296 (4.6323) Test: [ 0/96] eta: 0:03:20 loss: 4.6544 (4.6544) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 2.0890 data: 2.0272 max mem: 5990 Test: [10/96] eta: 0:00:30 loss: 4.6214 (4.6226) acc1: 0.0000 (1.2784) acc5: 6.2500 (5.9659) time: 0.3511 data: 0.3042 max mem: 5990 Test: [20/96] eta: 0:00:20 loss: 4.6202 (4.6247) acc1: 0.0000 (1.1905) acc5: 6.2500 (5.8036) time: 0.1776 data: 0.1393 max mem: 5990 Test: [30/96] eta: 0:00:16 loss: 4.6202 (4.6225) acc1: 1.5625 (1.3105) acc5: 6.2500 (6.1492) time: 0.1829 data: 0.1519 max mem: 5990 Test: [40/96] eta: 0:00:12 loss: 4.6201 (4.6215) acc1: 1.5625 (1.2576) acc5: 6.2500 (6.3643) time: 0.1774 data: 0.1468 max mem: 5990 Test: [50/96] eta: 0:00:14 loss: 4.6167 (4.6220) acc1: 1.5625 (1.2561) acc5: 7.8125 (6.4032) time: 0.4129 data: 0.3825 max mem: 5990 Test: [60/96] eta: 0:00:11 loss: 4.6241 (4.6227) acc1: 1.5625 (1.3064) acc5: 7.8125 (6.4805) time: 0.4734 data: 0.4429 max mem: 5990 Test: [70/96] eta: 0:00:08 loss: 4.6203 (4.6224) acc1: 1.5625 (1.3424) acc5: 6.2500 (6.5801) time: 0.4308 data: 0.4001 max mem: 5990 Test: [80/96] eta: 0:00:05 loss: 4.6192 (4.6231) acc1: 1.5625 (1.3310) acc5: 4.6875 (6.3465) time: 0.4544 data: 0.4239 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 4.6195 (4.6224) acc1: 1.5625 (1.3736) acc5: 6.2500 (6.5076) time: 0.2991 data: 0.2687 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 4.6195 (4.6220) acc1: 1.5625 (1.3640) acc5: 6.2500 (6.5900) time: 0.3214 data: 0.2909 max mem: 5990 Test: Total time: 0:00:31 (0.3302 s / it)
- Acc@1 1.364 Acc@5 6.590 loss 4.622 Accuracy of the network on the 6085 test images: 1.4% Max accuracy: 1.36% Epoch: [1] [ 0/31] eta: 0:01:28 lr: 0.000001 loss: 4.6310 (4.6310) time: 2.8510 data: 2.7240 max mem: 5990 Epoch: [1] [10/31] eta: 0:00:08 lr: 0.000001 loss: 4.6292 (4.6282) time: 0.3916 data: 0.3283 max mem: 5990 Epoch: [1] [20/31] eta: 0:00:03 lr: 0.000001 loss: 4.6274 (4.6321) time: 0.1584 data: 0.1013 max mem: 5990 Epoch: [1] [30/31] eta: 0:00:00 lr: 0.000001 loss: 4.6285 (4.6327) time: 0.1333 data: 0.0770 max mem: 5990 Epoch: [1] Total time: 0:00:07 (0.2501 s / it) Averaged stats: lr: 0.000001 loss: 4.6285 (4.6327) Epoch: [2] [ 0/31] eta: 0:01:26 lr: 0.000151 loss: 4.6532 (4.6532) time: 2.7789 data: 2.6932 max mem: 5990 Epoch: [2] [10/31] eta: 0:00:08 lr: 0.000151 loss: 4.6293 (4.6245) time: 0.4026 data: 0.3446 max mem: 5990 Epoch: [2] [20/31] eta: 0:00:03 lr: 0.000151 loss: 4.6065 (4.6126) time: 0.1687 data: 0.1101 max mem: 5990 Epoch: [2] [30/31] eta: 0:00:00 lr: 0.000151 loss: 4.5929 (4.6055) time: 0.1253 data: 0.0566 max mem: 5990 Epoch: [2] Total time: 0:00:07 (0.2460 s / it) Averaged stats: lr: 0.000151 loss: 4.5929 (4.6055) Epoch: [3] [ 0/31] eta: 0:01:27 lr: 0.000301 loss: 4.5709 (4.5709) time: 2.8342 data: 2.7405 max mem: 5990 Epoch: [3] [10/31] eta: 0:00:07 lr: 0.000301 loss: 4.5674 (4.5617) time: 0.3490 data: 0.2734 max mem: 5990 Epoch: [3] [20/31] eta: 0:00:02 lr: 0.000301 loss: 4.5426 (4.5455) time: 0.1000 data: 0.0260 max mem: 5990 Epoch: [3] [30/31] eta: 0:00:00 lr: 0.000301 loss: 4.5200 (4.5302) time: 0.0860 data: 0.0127 max mem: 5990 Epoch: [3] Total time: 0:00:06 (0.2088 s / it) Averaged stats: lr: 0.000301 loss: 4.5200 (4.5302) Epoch: [4] [ 0/31] eta: 0:01:33 lr: 0.000451 loss: 4.4249 (4.4249) time: 3.0064 data: 2.9210 max mem: 5990 Epoch: [4] [10/31] eta: 0:00:08 lr: 0.000451 loss: 4.4249 (4.4226) time: 0.4053 data: 0.3270 max mem: 5990 Epoch: [4] [20/31] eta: 0:00:02 lr: 0.000451 loss: 4.4041 (4.3995) time: 0.1303 data: 0.0533 max mem: 5990 Epoch: [4] [30/31] eta: 0:00:00 lr: 0.000451 loss: 4.3344 (4.3709) time: 0.0950 data: 0.0195 max mem: 5990 Epoch: [4] Total time: 0:00:07 (0.2292 s / it) Averaged stats: lr: 0.000451 loss: 4.3344 (4.3709) Epoch: [5] [ 0/31] eta: 0:01:36 lr: 0.000601 loss: 4.2519 (4.2519) time: 3.1034 data: 3.0119 max mem: 5990 Epoch: [5] [10/31] eta: 0:00:07 lr: 0.000601 loss: 4.2325 (4.2076) time: 0.3798 data: 0.3119 max mem: 5990 Epoch: [5] [20/31] eta: 0:00:02 lr: 0.000601 loss: 4.1476 (4.1575) time: 0.1126 data: 0.0481 max mem: 5990 Epoch: [5] [30/31] eta: 0:00:00 lr: 0.000601 loss: 4.0326 (4.0860) time: 0.1035 data: 0.0408 max mem: 5990 Epoch: [5] Total time: 0:00:07 (0.2275 s / it) Averaged stats: lr: 0.000601 loss: 4.0326 (4.0860) Epoch: [6] [ 0/31] eta: 0:01:25 lr: 0.000751 loss: 3.8179 (3.8179) time: 2.7439 data: 2.6598 max mem: 5990 Epoch: [6] [10/31] eta: 0:00:07 lr: 0.000751 loss: 3.7658 (3.7508) time: 0.3355 data: 0.2703 max mem: 5990 Epoch: [6] [20/31] eta: 0:00:02 lr: 0.000751 loss: 3.6259 (3.6537) time: 0.1201 data: 0.0555 max mem: 5990 Epoch: [6] [30/31] eta: 0:00:00 lr: 0.000751 loss: 3.4356 (3.5614) time: 0.1188 data: 0.0551 max mem: 5990 Epoch: [6] Total time: 0:00:06 (0.2205 s / it) Averaged stats: lr: 0.000751 loss: 3.4356 (3.5614) Epoch: [7] [ 0/31] eta: 0:01:13 lr: 0.000900 loss: 3.2291 (3.2291) time: 2.3702 data: 2.2823 max mem: 5990 Epoch: [7] [10/31] eta: 0:00:06 lr: 0.000900 loss: 3.0378 (3.0480) time: 0.3283 data: 0.2619 max mem: 5990 Epoch: [7] [20/31] eta: 0:00:02 lr: 0.000900 loss: 2.8062 (2.8795) time: 0.1400 data: 0.0756 max mem: 5990 Epoch: [7] [30/31] eta: 0:00:00 lr: 0.000900 loss: 2.6863 (2.7895) time: 0.1244 data: 0.0601 max mem: 5990 Epoch: [7] Total time: 0:00:06 (0.2207 s / it) Averaged stats: lr: 0.000900 loss: 2.6863 (2.7895) Epoch: [8] [ 0/31] eta: 0:01:11 lr: 0.001050 loss: 2.2693 (2.2693) time: 2.3110 data: 2.2295 max mem: 5990 Epoch: [8] [10/31] eta: 0:00:06 lr: 0.001050 loss: 2.1118 (2.1269) time: 0.3282 data: 0.2649 max mem: 5990 Epoch: [8] [20/31] eta: 0:00:02 lr: 0.001050 loss: 1.9326 (1.9955) time: 0.1478 data: 0.0902 max mem: 5990 Epoch: [8] [30/31] eta: 0:00:00 lr: 0.001050 loss: 1.7223 (1.9044) time: 0.1308 data: 0.0742 max mem: 5990 Epoch: [8] Total time: 0:00:06 (0.2258 s / it) Averaged stats: lr: 0.001050 loss: 1.7223 (1.9044) Epoch: [9] [ 0/31] eta: 0:01:04 lr: 0.001200 loss: 1.4659 (1.4659) time: 2.0788 data: 2.0195 max mem: 5990 Epoch: [9] [10/31] eta: 0:00:07 lr: 0.001200 loss: 1.3131 (1.3333) time: 0.3498 data: 0.2913 max mem: 5990 Epoch: [9] [20/31] eta: 0:00:02 lr: 0.001200 loss: 1.1918 (1.2264) time: 0.1741 data: 0.1129 max mem: 5990 Epoch: [9] [30/31] eta: 0:00:00 lr: 0.001200 loss: 1.1147 (1.1966) time: 0.1393 data: 0.0716 max mem: 5990 Epoch: [9] Total time: 0:00:07 (0.2335 s / it) Averaged stats: lr: 0.001200 loss: 1.1147 (1.1966) Epoch: [10] [ 0/31] eta: 0:01:13 lr: 0.001350 loss: 0.9914 (0.9914) time: 2.3695 data: 2.2925 max mem: 5990 Epoch: [10] [10/31] eta: 0:00:07 lr: 0.001350 loss: 0.7816 (0.7915) time: 0.3680 data: 0.3065 max mem: 5990 Epoch: [10] [20/31] eta: 0:00:03 lr: 0.001350 loss: 0.6986 (0.7441) time: 0.1746 data: 0.1121 max mem: 5990 Epoch: [10] [30/31] eta: 0:00:00 lr: 0.001350 loss: 0.6751 (0.7196) time: 0.1293 data: 0.0650 max mem: 5990 Epoch: [10] Total time: 0:00:07 (0.2277 s / it) Averaged stats: lr: 0.001350 loss: 0.6751 (0.7196) Test: [ 0/96] eta: 0:08:01 loss: 0.6214 (0.6214) acc1: 90.6250 (90.6250) acc5: 96.8750 (96.8750) time: 5.0115 data: 4.9566 max mem: 5990 Test: [10/96] eta: 0:01:02 loss: 0.8055 (0.8116) acc1: 85.9375 (85.0852) acc5: 93.7500 (92.7557) time: 0.7237 data: 0.6912 max mem: 5990 Test: [20/96] eta: 0:00:40 loss: 0.8291 (0.8349) acc1: 82.8125 (83.5565) acc5: 92.1875 (92.5595) time: 0.3126 data: 0.2820 max mem: 5990 Test: [30/96] eta: 0:00:31 loss: 0.8368 (0.8468) acc1: 82.8125 (83.6694) acc5: 92.1875 (92.4899) time: 0.3393 data: 0.3087 max mem: 5990 Test: [40/96] eta: 0:00:23 loss: 0.8162 (0.8346) acc1: 82.8125 (83.6128) acc5: 93.7500 (92.9116) time: 0.3059 data: 0.2756 max mem: 5990 Test: [50/96] eta: 0:00:19 loss: 0.7516 (0.8330) acc1: 85.9375 (83.7010) acc5: 93.7500 (93.0147) time: 0.3237 data: 0.2934 max mem: 5990 Test: [60/96] eta: 0:00:14 loss: 0.7516 (0.8239) acc1: 85.9375 (83.9652) acc5: 93.7500 (93.1865) time: 0.3673 data: 0.3371 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.7138 (0.8121) acc1: 84.3750 (84.2430) acc5: 95.3125 (93.4419) time: 0.4122 data: 0.3820 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.8005 (0.8170) acc1: 82.8125 (84.1049) acc5: 93.7500 (93.3835) time: 0.4055 data: 0.3752 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.8171 (0.8200) acc1: 82.8125 (84.1346) acc5: 93.7500 (93.3723) time: 0.3049 data: 0.2747 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.7974 (0.8123) acc1: 85.9375 (84.4043) acc5: 93.7500 (93.4758) time: 0.3441 data: 0.3146 max mem: 5990 Test: Total time: 0:00:37 (0.3882 s / it)
- Acc@1 84.404 Acc@5 93.476 loss 0.812 Accuracy of the network on the 6085 test images: 84.4% Max accuracy: 84.40% Epoch: [11] [ 0/31] eta: 0:01:38 lr: 0.001464 loss: 0.5327 (0.5327) time: 3.1782 data: 3.0993 max mem: 5990 Epoch: [11] [10/31] eta: 0:00:09 lr: 0.001464 loss: 0.5032 (0.4990) time: 0.4375 data: 0.3737 max mem: 5990 Epoch: [11] [20/31] eta: 0:00:03 lr: 0.001464 loss: 0.4836 (0.4740) time: 0.1649 data: 0.1016 max mem: 5990 Epoch: [11] [30/31] eta: 0:00:00 lr: 0.001464 loss: 0.4549 (0.4714) time: 0.1288 data: 0.0662 max mem: 5990 Epoch: [11] Total time: 0:00:08 (0.2628 s / it) Averaged stats: lr: 0.001464 loss: 0.4549 (0.4714) Epoch: [12] [ 0/31] eta: 0:01:15 lr: 0.001456 loss: 0.3363 (0.3363) time: 2.4303 data: 2.3377 max mem: 5990 Epoch: [12] [10/31] eta: 0:00:06 lr: 0.001456 loss: 0.3599 (0.3702) time: 0.3018 data: 0.2368 max mem: 5990 Epoch: [12] [20/31] eta: 0:00:02 lr: 0.001456 loss: 0.3561 (0.3547) time: 0.1028 data: 0.0397 max mem: 5990 Epoch: [12] [30/31] eta: 0:00:00 lr: 0.001456 loss: 0.2947 (0.3350) time: 0.0903 data: 0.0270 max mem: 5990 Epoch: [12] Total time: 0:00:05 (0.1897 s / it) Averaged stats: lr: 0.001456 loss: 0.2947 (0.3350) Epoch: [13] [ 0/31] eta: 0:01:27 lr: 0.001448 loss: 0.2183 (0.2183) time: 2.8333 data: 2.7639 max mem: 5990 Epoch: [13] [10/31] eta: 0:00:08 lr: 0.001448 loss: 0.2390 (0.2512) time: 0.3876 data: 0.3263 max mem: 5990 Epoch: [13] [20/31] eta: 0:00:02 lr: 0.001448 loss: 0.2330 (0.2360) time: 0.1372 data: 0.0718 max mem: 5990 Epoch: [13] [30/31] eta: 0:00:00 lr: 0.001448 loss: 0.2330 (0.2411) time: 0.0959 data: 0.0306 max mem: 5990 Epoch: [13] Total time: 0:00:07 (0.2261 s / it) Averaged stats: lr: 0.001448 loss: 0.2330 (0.2411) Epoch: [14] [ 0/31] eta: 0:01:31 lr: 0.001439 loss: 0.2780 (0.2780) time: 2.9360 data: 2.8514 max mem: 5990 Epoch: [14] [10/31] eta: 0:00:07 lr: 0.001439 loss: 0.1618 (0.1779) time: 0.3654 data: 0.2989 max mem: 5990 Epoch: [14] [20/31] eta: 0:00:02 lr: 0.001439 loss: 0.1737 (0.1867) time: 0.1166 data: 0.0523 max mem: 5990 Epoch: [14] [30/31] eta: 0:00:00 lr: 0.001439 loss: 0.1699 (0.1818) time: 0.1089 data: 0.0456 max mem: 5990 Epoch: [14] Total time: 0:00:06 (0.2236 s / it) Averaged stats: lr: 0.001439 loss: 0.1699 (0.1818) Epoch: [15] [ 0/31] eta: 0:01:26 lr: 0.001429 loss: 0.2323 (0.2323) time: 2.7827 data: 2.7116 max mem: 5990 Epoch: [15] [10/31] eta: 0:00:07 lr: 0.001429 loss: 0.1554 (0.1684) time: 0.3546 data: 0.2922 max mem: 5990 Epoch: [15] [20/31] eta: 0:00:02 lr: 0.001429 loss: 0.1473 (0.1611) time: 0.1168 data: 0.0547 max mem: 5990 Epoch: [15] [30/31] eta: 0:00:00 lr: 0.001429 loss: 0.1599 (0.1609) time: 0.1053 data: 0.0463 max mem: 5990 Epoch: [15] Total time: 0:00:06 (0.2169 s / it) Averaged stats: lr: 0.001429 loss: 0.1599 (0.1609) Epoch: [16] [ 0/31] eta: 0:01:16 lr: 0.001419 loss: 0.0864 (0.0864) time: 2.4760 data: 2.3862 max mem: 5990 Epoch: [16] [10/31] eta: 0:00:07 lr: 0.001419 loss: 0.1139 (0.1270) time: 0.3349 data: 0.2714 max mem: 5990 Epoch: [16] [20/31] eta: 0:00:02 lr: 0.001419 loss: 0.1175 (0.1281) time: 0.1440 data: 0.0828 max mem: 5990 Epoch: [16] [30/31] eta: 0:00:00 lr: 0.001419 loss: 0.1296 (0.1392) time: 0.1351 data: 0.0740 max mem: 5990 Epoch: [16] Total time: 0:00:07 (0.2312 s / it) Averaged stats: lr: 0.001419 loss: 0.1296 (0.1392) Epoch: [17] [ 0/31] eta: 0:01:15 lr: 0.001408 loss: 0.1381 (0.1381) time: 2.4268 data: 2.3458 max mem: 5990 Epoch: [17] [10/31] eta: 0:00:07 lr: 0.001408 loss: 0.1098 (0.1235) time: 0.3640 data: 0.3004 max mem: 5990 Epoch: [17] [20/31] eta: 0:00:02 lr: 0.001408 loss: 0.1050 (0.1240) time: 0.1638 data: 0.1006 max mem: 5990 Epoch: [17] [30/31] eta: 0:00:00 lr: 0.001408 loss: 0.1050 (0.1182) time: 0.1395 data: 0.0764 max mem: 5990 Epoch: [17] Total time: 0:00:07 (0.2321 s / it) Averaged stats: lr: 0.001408 loss: 0.1050 (0.1182) Epoch: [18] [ 0/31] eta: 0:00:52 lr: 0.001396 loss: 0.0990 (0.0990) time: 1.6973 data: 1.6236 max mem: 5990 Epoch: [18] [10/31] eta: 0:00:06 lr: 0.001396 loss: 0.0870 (0.0926) time: 0.3172 data: 0.2516 max mem: 5990 Epoch: [18] [20/31] eta: 0:00:02 lr: 0.001396 loss: 0.0966 (0.1091) time: 0.1754 data: 0.1112 max mem: 5990 Epoch: [18] [30/31] eta: 0:00:00 lr: 0.001396 loss: 0.1040 (0.1070) time: 0.1721 data: 0.1078 max mem: 5990 Epoch: [18] Total time: 0:00:07 (0.2493 s / it) Averaged stats: lr: 0.001396 loss: 0.1040 (0.1070) Epoch: [19] [ 0/31] eta: 0:01:32 lr: 0.001384 loss: 0.0780 (0.0780) time: 2.9926 data: 2.9177 max mem: 5990 Epoch: [19] [10/31] eta: 0:00:09 lr: 0.001384 loss: 0.0780 (0.0738) time: 0.4344 data: 0.3709 max mem: 5990 Epoch: [19] [20/31] eta: 0:00:03 lr: 0.001384 loss: 0.0748 (0.0762) time: 0.1645 data: 0.1015 max mem: 5990 Epoch: [19] [30/31] eta: 0:00:00 lr: 0.001384 loss: 0.0748 (0.0791) time: 0.1061 data: 0.0434 max mem: 5990 Epoch: [19] Total time: 0:00:07 (0.2399 s / it) Averaged stats: lr: 0.001384 loss: 0.0748 (0.0791) Epoch: [20] [ 0/31] eta: 0:01:36 lr: 0.001371 loss: 0.0624 (0.0624) time: 3.1215 data: 3.0477 max mem: 5990 Epoch: [20] [10/31] eta: 0:00:09 lr: 0.001371 loss: 0.0534 (0.0567) time: 0.4319 data: 0.3744 max mem: 5990 Epoch: [20] [20/31] eta: 0:00:02 lr: 0.001371 loss: 0.0717 (0.0782) time: 0.1287 data: 0.0689 max mem: 5990 Epoch: [20] [30/31] eta: 0:00:00 lr: 0.001371 loss: 0.0789 (0.0793) time: 0.0778 data: 0.0155 max mem: 5990 Epoch: [20] Total time: 0:00:07 (0.2297 s / it) Averaged stats: lr: 0.001371 loss: 0.0789 (0.0793) Test: [ 0/96] eta: 0:07:25 loss: 0.3063 (0.3063) acc1: 92.1875 (92.1875) acc5: 98.4375 (98.4375) time: 4.6374 data: 4.5929 max mem: 5990 Test: [10/96] eta: 0:00:58 loss: 0.4185 (0.4360) acc1: 89.0625 (88.6364) acc5: 96.8750 (97.0170) time: 0.6815 data: 0.6499 max mem: 5990 Test: [20/96] eta: 0:00:40 loss: 0.4213 (0.4393) acc1: 89.0625 (88.5417) acc5: 98.4375 (97.6190) time: 0.3316 data: 0.3012 max mem: 5990 Test: [30/96] eta: 0:00:32 loss: 0.4808 (0.4517) acc1: 87.5000 (88.4073) acc5: 96.8750 (97.1774) time: 0.3860 data: 0.3555 max mem: 5990 Test: [40/96] eta: 0:00:25 loss: 0.4052 (0.4308) acc1: 89.0625 (88.9482) acc5: 98.4375 (97.5991) time: 0.3691 data: 0.3380 max mem: 5990 Test: [50/96] eta: 0:00:20 loss: 0.3851 (0.4331) acc1: 89.0625 (88.7255) acc5: 98.4375 (97.4571) time: 0.3618 data: 0.3302 max mem: 5990 Test: [60/96] eta: 0:00:15 loss: 0.4071 (0.4294) acc1: 89.0625 (89.0113) acc5: 96.8750 (97.5154) time: 0.3687 data: 0.3371 max mem: 5990 Test: [70/96] eta: 0:00:11 loss: 0.3772 (0.4228) acc1: 90.6250 (89.1285) acc5: 98.4375 (97.5352) time: 0.4226 data: 0.3913 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.4113 (0.4296) acc1: 89.0625 (88.9853) acc5: 96.8750 (97.4344) time: 0.4115 data: 0.3806 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4430 (0.4303) acc1: 89.0625 (89.0968) acc5: 96.8750 (97.4245) time: 0.3044 data: 0.2739 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.4043 (0.4200) acc1: 90.6250 (89.2851) acc5: 98.4375 (97.5021) time: 0.3383 data: 0.3083 max mem: 5990 Test: Total time: 0:00:38 (0.4014 s / it)
- Acc@1 89.285 Acc@5 97.502 loss 0.420 Accuracy of the network on the 6085 test images: 89.3% Max accuracy: 89.29% Epoch: [21] [ 0/31] eta: 0:01:14 lr: 0.001358 loss: 0.0384 (0.0384) time: 2.4033 data: 2.2991 max mem: 5990 Epoch: [21] [10/31] eta: 0:00:07 lr: 0.001358 loss: 0.0687 (0.0785) time: 0.3542 data: 0.2774 max mem: 5990 Epoch: [21] [20/31] eta: 0:00:02 lr: 0.001358 loss: 0.0666 (0.0758) time: 0.1625 data: 0.0865 max mem: 5990 Epoch: [21] [30/31] eta: 0:00:00 lr: 0.001358 loss: 0.0565 (0.0700) time: 0.1389 data: 0.0625 max mem: 5990 Epoch: [21] Total time: 0:00:07 (0.2347 s / it) Averaged stats: lr: 0.001358 loss: 0.0565 (0.0700) Epoch: [22] [ 0/31] eta: 0:01:16 lr: 0.001344 loss: 0.0409 (0.0409) time: 2.4532 data: 2.3489 max mem: 5990 Epoch: [22] [10/31] eta: 0:00:07 lr: 0.001344 loss: 0.0503 (0.0529) time: 0.3741 data: 0.2961 max mem: 5990 Epoch: [22] [20/31] eta: 0:00:03 lr: 0.001344 loss: 0.0552 (0.0615) time: 0.1739 data: 0.0990 max mem: 5990 Epoch: [22] [30/31] eta: 0:00:00 lr: 0.001344 loss: 0.0528 (0.0615) time: 0.1351 data: 0.0610 max mem: 5990 Epoch: [22] Total time: 0:00:07 (0.2345 s / it) Averaged stats: lr: 0.001344 loss: 0.0528 (0.0615) Epoch: [23] [ 0/31] eta: 0:00:58 lr: 0.001329 loss: 0.0337 (0.0337) time: 1.8896 data: 1.8133 max mem: 5990 Epoch: [23] [10/31] eta: 0:00:07 lr: 0.001329 loss: 0.0448 (0.0544) time: 0.3467 data: 0.2713 max mem: 5990 Epoch: [23] [20/31] eta: 0:00:02 lr: 0.001329 loss: 0.0543 (0.0564) time: 0.1768 data: 0.1051 max mem: 5990 Epoch: [23] [30/31] eta: 0:00:00 lr: 0.001329 loss: 0.0384 (0.0556) time: 0.1611 data: 0.0998 max mem: 5990 Epoch: [23] Total time: 0:00:07 (0.2516 s / it) Averaged stats: lr: 0.001329 loss: 0.0384 (0.0556) Epoch: [24] [ 0/31] eta: 0:01:42 lr: 0.001314 loss: 0.0350 (0.0350) time: 3.3095 data: 3.2283 max mem: 5990 Epoch: [24] [10/31] eta: 0:00:09 lr: 0.001314 loss: 0.0386 (0.0638) time: 0.4534 data: 0.3892 max mem: 5990 Epoch: [24] [20/31] eta: 0:00:03 lr: 0.001314 loss: 0.0386 (0.0556) time: 0.1491 data: 0.0859 max mem: 5990 Epoch: [24] [30/31] eta: 0:00:00 lr: 0.001314 loss: 0.0422 (0.0550) time: 0.0955 data: 0.0334 max mem: 5990 Epoch: [24] Total time: 0:00:07 (0.2419 s / it) Averaged stats: lr: 0.001314 loss: 0.0422 (0.0550) Epoch: [25] [ 0/31] eta: 0:01:43 lr: 0.001298 loss: 0.0624 (0.0624) time: 3.3376 data: 3.2580 max mem: 5990 Epoch: [25] [10/31] eta: 0:00:09 lr: 0.001298 loss: 0.0448 (0.0519) time: 0.4439 data: 0.3815 max mem: 5990 Epoch: [25] [20/31] eta: 0:00:03 lr: 0.001298 loss: 0.0422 (0.0515) time: 0.1294 data: 0.0698 max mem: 5990 Epoch: [25] [30/31] eta: 0:00:00 lr: 0.001298 loss: 0.0415 (0.0547) time: 0.0831 data: 0.0229 max mem: 5990 Epoch: [25] Total time: 0:00:07 (0.2363 s / it) Averaged stats: lr: 0.001298 loss: 0.0415 (0.0547) Epoch: [26] [ 0/31] eta: 0:01:37 lr: 0.001282 loss: 0.0202 (0.0202) time: 3.1353 data: 3.0531 max mem: 5990 Epoch: [26] [10/31] eta: 0:00:08 lr: 0.001282 loss: 0.0257 (0.0359) time: 0.4026 data: 0.3368 max mem: 5990 Epoch: [26] [20/31] eta: 0:00:02 lr: 0.001282 loss: 0.0311 (0.0360) time: 0.1273 data: 0.0637 max mem: 5990 Epoch: [26] [30/31] eta: 0:00:00 lr: 0.001282 loss: 0.0366 (0.0396) time: 0.0927 data: 0.0312 max mem: 5990 Epoch: [26] Total time: 0:00:07 (0.2276 s / it) Averaged stats: lr: 0.001282 loss: 0.0366 (0.0396) Epoch: [27] [ 0/31] eta: 0:01:35 lr: 0.001265 loss: 0.0276 (0.0276) time: 3.0721 data: 2.9815 max mem: 5990 Epoch: [27] [10/31] eta: 0:00:07 lr: 0.001265 loss: 0.0276 (0.0314) time: 0.3637 data: 0.2987 max mem: 5990 Epoch: [27] [20/31] eta: 0:00:02 lr: 0.001265 loss: 0.0334 (0.0355) time: 0.1089 data: 0.0466 max mem: 5990 Epoch: [27] [30/31] eta: 0:00:00 lr: 0.001265 loss: 0.0368 (0.0403) time: 0.0931 data: 0.0315 max mem: 5990 Epoch: [27] Total time: 0:00:06 (0.2116 s / it) Averaged stats: lr: 0.001265 loss: 0.0368 (0.0403) Epoch: [28] [ 0/31] eta: 0:01:42 lr: 0.001248 loss: 0.0598 (0.0598) time: 3.3145 data: 3.2474 max mem: 5990 Epoch: [28] [10/31] eta: 0:00:08 lr: 0.001248 loss: 0.0328 (0.0446) time: 0.4127 data: 0.3571 max mem: 5990 Epoch: [28] [20/31] eta: 0:00:02 lr: 0.001248 loss: 0.0401 (0.0528) time: 0.1199 data: 0.0621 max mem: 5990 Epoch: [28] [30/31] eta: 0:00:00 lr: 0.001248 loss: 0.0468 (0.0524) time: 0.1046 data: 0.0436 max mem: 5990 Epoch: [28] Total time: 0:00:07 (0.2397 s / it) Averaged stats: lr: 0.001248 loss: 0.0468 (0.0524) Epoch: [29] [ 0/31] eta: 0:01:14 lr: 0.001230 loss: 0.0238 (0.0238) time: 2.4000 data: 2.3094 max mem: 5990 Epoch: [29] [10/31] eta: 0:00:06 lr: 0.001230 loss: 0.0312 (0.0482) time: 0.3132 data: 0.2491 max mem: 5990 Epoch: [29] [20/31] eta: 0:00:02 lr: 0.001230 loss: 0.0426 (0.0514) time: 0.1263 data: 0.0640 max mem: 5990 Epoch: [29] [30/31] eta: 0:00:00 lr: 0.001230 loss: 0.0418 (0.0482) time: 0.1268 data: 0.0642 max mem: 5990 Epoch: [29] Total time: 0:00:06 (0.2198 s / it) Averaged stats: lr: 0.001230 loss: 0.0418 (0.0482) Epoch: [30] [ 0/31] eta: 0:01:13 lr: 0.001212 loss: 0.0272 (0.0272) time: 2.3817 data: 2.3113 max mem: 5990 Epoch: [30] [10/31] eta: 0:00:07 lr: 0.001212 loss: 0.0352 (0.0445) time: 0.3338 data: 0.2762 max mem: 5990 Epoch: [30] [20/31] eta: 0:00:02 lr: 0.001212 loss: 0.0324 (0.0390) time: 0.1505 data: 0.0937 max mem: 5990 Epoch: [30] [30/31] eta: 0:00:00 lr: 0.001212 loss: 0.0282 (0.0368) time: 0.1307 data: 0.0716 max mem: 5990 Epoch: [30] Total time: 0:00:07 (0.2281 s / it) Averaged stats: lr: 0.001212 loss: 0.0282 (0.0368) Test: [ 0/96] eta: 0:07:57 loss: 0.2329 (0.2329) acc1: 93.7500 (93.7500) acc5: 98.4375 (98.4375) time: 4.9704 data: 4.9301 max mem: 5990 Test: [10/96] eta: 0:01:09 loss: 0.4148 (0.4427) acc1: 89.0625 (88.3523) acc5: 96.8750 (96.8750) time: 0.8047 data: 0.7728 max mem: 5990 Test: [20/96] eta: 0:00:45 loss: 0.4088 (0.4242) acc1: 87.5000 (88.2440) acc5: 96.8750 (97.2470) time: 0.3768 data: 0.3462 max mem: 5990 Test: [30/96] eta: 0:00:34 loss: 0.4557 (0.4446) acc1: 87.5000 (88.3065) acc5: 96.8750 (97.0766) time: 0.3766 data: 0.3464 max mem: 5990 Test: [40/96] eta: 0:00:27 loss: 0.4072 (0.4280) acc1: 87.5000 (88.4146) acc5: 96.8750 (97.2180) time: 0.3683 data: 0.3380 max mem: 5990 Test: [50/96] eta: 0:00:21 loss: 0.3908 (0.4292) acc1: 89.0625 (88.3885) acc5: 96.8750 (97.0895) time: 0.3690 data: 0.3383 max mem: 5990 Test: [60/96] eta: 0:00:16 loss: 0.3812 (0.4267) acc1: 89.0625 (88.4221) acc5: 96.8750 (97.1568) time: 0.3656 data: 0.3349 max mem: 5990 Test: [70/96] eta: 0:00:11 loss: 0.3699 (0.4192) acc1: 89.0625 (88.5343) acc5: 96.8750 (97.1171) time: 0.3574 data: 0.3270 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.3702 (0.4245) acc1: 87.5000 (88.5031) acc5: 96.8750 (97.0293) time: 0.2921 data: 0.2617 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4538 (0.4261) acc1: 89.0625 (88.5474) acc5: 96.8750 (96.9437) time: 0.2192 data: 0.1886 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3437 (0.4153) acc1: 89.0625 (88.7921) acc5: 96.8750 (96.9762) time: 0.2677 data: 0.2378 max mem: 5990 Test: Total time: 0:00:36 (0.3854 s / it)
- Acc@1 88.792 Acc@5 96.976 loss 0.415 Accuracy of the network on the 6085 test images: 88.8% Max accuracy: 89.29% Epoch: [31] [ 0/31] eta: 0:01:03 lr: 0.001193 loss: 0.0157 (0.0157) time: 2.0459 data: 1.9780 max mem: 5990 Epoch: [31] [10/31] eta: 0:00:07 lr: 0.001193 loss: 0.0290 (0.0328) time: 0.3395 data: 0.2750 max mem: 5990 Epoch: [31] [20/31] eta: 0:00:02 lr: 0.001193 loss: 0.0253 (0.0331) time: 0.1777 data: 0.1130 max mem: 5990 Epoch: [31] [30/31] eta: 0:00:00 lr: 0.001193 loss: 0.0246 (0.0317) time: 0.1514 data: 0.0874 max mem: 5990 Epoch: [31] Total time: 0:00:07 (0.2321 s / it) Averaged stats: lr: 0.001193 loss: 0.0246 (0.0317) Epoch: [32] [ 0/31] eta: 0:00:54 lr: 0.001174 loss: 0.0448 (0.0448) time: 1.7728 data: 1.7013 max mem: 5990 Epoch: [32] [10/31] eta: 0:00:06 lr: 0.001174 loss: 0.0386 (0.0411) time: 0.3183 data: 0.2606 max mem: 5990 Epoch: [32] [20/31] eta: 0:00:02 lr: 0.001174 loss: 0.0278 (0.0397) time: 0.1662 data: 0.1103 max mem: 5990 Epoch: [32] [30/31] eta: 0:00:00 lr: 0.001174 loss: 0.0313 (0.0399) time: 0.1702 data: 0.1115 max mem: 5990 Epoch: [32] Total time: 0:00:07 (0.2413 s / it) Averaged stats: lr: 0.001174 loss: 0.0313 (0.0399) Epoch: [33] [ 0/31] eta: 0:01:36 lr: 0.001154 loss: 0.0161 (0.0161) time: 3.1056 data: 3.0193 max mem: 5990 Epoch: [33] [10/31] eta: 0:00:09 lr: 0.001154 loss: 0.0184 (0.0365) time: 0.4353 data: 0.3698 max mem: 5990 Epoch: [33] [20/31] eta: 0:00:03 lr: 0.001154 loss: 0.0184 (0.0331) time: 0.1711 data: 0.1071 max mem: 5990 Epoch: [33] [30/31] eta: 0:00:00 lr: 0.001154 loss: 0.0202 (0.0308) time: 0.1185 data: 0.0547 max mem: 5990 Epoch: [33] Total time: 0:00:07 (0.2451 s / it) Averaged stats: lr: 0.001154 loss: 0.0202 (0.0308) Epoch: [34] [ 0/31] eta: 0:01:33 lr: 0.001134 loss: 0.0375 (0.0375) time: 3.0311 data: 2.9436 max mem: 5990 Epoch: [34] [10/31] eta: 0:00:09 lr: 0.001134 loss: 0.0231 (0.0361) time: 0.4437 data: 0.3777 max mem: 5990 Epoch: [34] [20/31] eta: 0:00:03 lr: 0.001134 loss: 0.0277 (0.0426) time: 0.1559 data: 0.0924 max mem: 5990 Epoch: [34] [30/31] eta: 0:00:00 lr: 0.001134 loss: 0.0277 (0.0398) time: 0.0939 data: 0.0319 max mem: 5990 Epoch: [34] Total time: 0:00:07 (0.2424 s / it) Averaged stats: lr: 0.001134 loss: 0.0277 (0.0398) Epoch: [35] [ 0/31] eta: 0:01:40 lr: 0.001114 loss: 0.0171 (0.0171) time: 3.2433 data: 3.1697 max mem: 5990 Epoch: [35] [10/31] eta: 0:00:08 lr: 0.001114 loss: 0.0204 (0.0385) time: 0.4075 data: 0.3452 max mem: 5990 Epoch: [35] [20/31] eta: 0:00:02 lr: 0.001114 loss: 0.0245 (0.0391) time: 0.1181 data: 0.0570 max mem: 5990 Epoch: [35] [30/31] eta: 0:00:00 lr: 0.001114 loss: 0.0279 (0.0401) time: 0.0869 data: 0.0256 max mem: 5990 Epoch: [35] Total time: 0:00:06 (0.2235 s / it) Averaged stats: lr: 0.001114 loss: 0.0279 (0.0401) Epoch: [36] [ 0/31] eta: 0:01:34 lr: 0.001093 loss: 0.0119 (0.0119) time: 3.0579 data: 2.9768 max mem: 5990 Epoch: [36] [10/31] eta: 0:00:08 lr: 0.001093 loss: 0.0167 (0.0262) time: 0.3946 data: 0.3298 max mem: 5990 Epoch: [36] [20/31] eta: 0:00:02 lr: 0.001093 loss: 0.0230 (0.0330) time: 0.1268 data: 0.0634 max mem: 5990 Epoch: [36] [30/31] eta: 0:00:00 lr: 0.001093 loss: 0.0257 (0.0313) time: 0.1118 data: 0.0494 max mem: 5990 Epoch: [36] Total time: 0:00:07 (0.2364 s / it) Averaged stats: lr: 0.001093 loss: 0.0257 (0.0313) Epoch: [37] [ 0/31] eta: 0:01:13 lr: 0.001072 loss: 0.0221 (0.0221) time: 2.3600 data: 2.2796 max mem: 5990 Epoch: [37] [10/31] eta: 0:00:06 lr: 0.001072 loss: 0.0218 (0.0208) time: 0.3222 data: 0.2660 max mem: 5990 Epoch: [37] [20/31] eta: 0:00:02 lr: 0.001072 loss: 0.0190 (0.0223) time: 0.1409 data: 0.0857 max mem: 5990 Epoch: [37] [30/31] eta: 0:00:00 lr: 0.001072 loss: 0.0165 (0.0207) time: 0.1319 data: 0.0731 max mem: 5990 Epoch: [37] Total time: 0:00:07 (0.2306 s / it) Averaged stats: lr: 0.001072 loss: 0.0165 (0.0207) Epoch: [38] [ 0/31] eta: 0:01:16 lr: 0.001051 loss: 0.0201 (0.0201) time: 2.4784 data: 2.3940 max mem: 5990 Epoch: [38] [10/31] eta: 0:00:08 lr: 0.001051 loss: 0.0197 (0.0193) time: 0.3831 data: 0.3178 max mem: 5990 Epoch: [38] [20/31] eta: 0:00:03 lr: 0.001051 loss: 0.0184 (0.0214) time: 0.1711 data: 0.1086 max mem: 5990 Epoch: [38] [30/31] eta: 0:00:00 lr: 0.001051 loss: 0.0163 (0.0230) time: 0.1330 data: 0.0711 max mem: 5990 Epoch: [38] Total time: 0:00:07 (0.2367 s / it) Averaged stats: lr: 0.001051 loss: 0.0163 (0.0230) Epoch: [39] [ 0/31] eta: 0:00:59 lr: 0.001029 loss: 0.0070 (0.0070) time: 1.9292 data: 1.8645 max mem: 5990 Epoch: [39] [10/31] eta: 0:00:06 lr: 0.001029 loss: 0.0129 (0.0209) time: 0.3267 data: 0.2625 max mem: 5990 Epoch: [39] [20/31] eta: 0:00:02 lr: 0.001029 loss: 0.0136 (0.0195) time: 0.1714 data: 0.1078 max mem: 5990 Epoch: [39] [30/31] eta: 0:00:00 lr: 0.001029 loss: 0.0153 (0.0194) time: 0.1693 data: 0.1068 max mem: 5990 Epoch: [39] Total time: 0:00:07 (0.2496 s / it) Averaged stats: lr: 0.001029 loss: 0.0153 (0.0194) Epoch: [40] [ 0/31] eta: 0:01:31 lr: 0.001007 loss: 0.0483 (0.0483) time: 2.9486 data: 2.8596 max mem: 5990 Epoch: [40] [10/31] eta: 0:00:08 lr: 0.001007 loss: 0.0182 (0.0228) time: 0.4196 data: 0.3536 max mem: 5990 Epoch: [40] [20/31] eta: 0:00:03 lr: 0.001007 loss: 0.0180 (0.0245) time: 0.1678 data: 0.1042 max mem: 5990 Epoch: [40] [30/31] eta: 0:00:00 lr: 0.001007 loss: 0.0183 (0.0241) time: 0.1424 data: 0.0791 max mem: 5990 Epoch: [40] Total time: 0:00:08 (0.2658 s / it) Averaged stats: lr: 0.001007 loss: 0.0183 (0.0241) Test: [ 0/96] eta: 0:07:53 loss: 0.2155 (0.2155) acc1: 93.7500 (93.7500) acc5: 96.8750 (96.8750) time: 4.9343 data: 4.8908 max mem: 5990 Test: [10/96] eta: 0:01:09 loss: 0.4546 (0.4689) acc1: 89.0625 (87.6420) acc5: 96.8750 (96.1648) time: 0.8082 data: 0.7767 max mem: 5990 Test: [20/96] eta: 0:00:45 loss: 0.4485 (0.4578) acc1: 89.0625 (87.9464) acc5: 96.8750 (96.3542) time: 0.3794 data: 0.3491 max mem: 5990 Test: [30/96] eta: 0:00:34 loss: 0.4474 (0.4743) acc1: 87.5000 (87.8024) acc5: 96.8750 (96.1694) time: 0.3658 data: 0.3353 max mem: 5990 Test: [40/96] eta: 0:00:25 loss: 0.4308 (0.4520) acc1: 87.5000 (87.9954) acc5: 96.8750 (96.5701) time: 0.3199 data: 0.2891 max mem: 5990 Test: [50/96] eta: 0:00:19 loss: 0.3854 (0.4522) acc1: 87.5000 (87.8983) acc5: 96.8750 (96.4767) time: 0.2591 data: 0.2285 max mem: 5990 Test: [60/96] eta: 0:00:14 loss: 0.4091 (0.4512) acc1: 87.5000 (88.1148) acc5: 96.8750 (96.5164) time: 0.3007 data: 0.2704 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3984 (0.4452) acc1: 89.0625 (88.2702) acc5: 96.8750 (96.6109) time: 0.3639 data: 0.3337 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.3927 (0.4474) acc1: 89.0625 (88.3295) acc5: 96.8750 (96.6049) time: 0.3465 data: 0.3162 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4603 (0.4488) acc1: 89.0625 (88.4272) acc5: 96.8750 (96.5659) time: 0.2665 data: 0.2363 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3451 (0.4363) acc1: 90.6250 (88.6771) acc5: 96.8750 (96.6146) time: 0.2657 data: 0.2363 max mem: 5990 Test: Total time: 0:00:34 (0.3638 s / it)
- Acc@1 88.677 Acc@5 96.615 loss 0.436 Accuracy of the network on the 6085 test images: 88.7% Max accuracy: 89.29% Epoch: [41] [ 0/31] eta: 0:01:36 lr: 0.000985 loss: 0.0088 (0.0088) time: 3.1201 data: 3.0440 max mem: 5990 Epoch: [41] [10/31] eta: 0:00:09 lr: 0.000985 loss: 0.0134 (0.0287) time: 0.4431 data: 0.3791 max mem: 5990 Epoch: [41] [20/31] eta: 0:00:03 lr: 0.000985 loss: 0.0215 (0.0294) time: 0.1589 data: 0.0959 max mem: 5990 Epoch: [41] [30/31] eta: 0:00:00 lr: 0.000985 loss: 0.0276 (0.0302) time: 0.1036 data: 0.0396 max mem: 5990 Epoch: [41] Total time: 0:00:07 (0.2469 s / it) Averaged stats: lr: 0.000985 loss: 0.0276 (0.0302) Epoch: [42] [ 0/31] eta: 0:01:35 lr: 0.000963 loss: 0.0116 (0.0116) time: 3.0968 data: 3.0098 max mem: 5990 Epoch: [42] [10/31] eta: 0:00:08 lr: 0.000963 loss: 0.0185 (0.0256) time: 0.4210 data: 0.3622 max mem: 5990 Epoch: [42] [20/31] eta: 0:00:03 lr: 0.000963 loss: 0.0185 (0.0241) time: 0.1398 data: 0.0849 max mem: 5990 Epoch: [42] [30/31] eta: 0:00:00 lr: 0.000963 loss: 0.0124 (0.0207) time: 0.0934 data: 0.0362 max mem: 5990 Epoch: [42] Total time: 0:00:07 (0.2355 s / it) Averaged stats: lr: 0.000963 loss: 0.0124 (0.0207) Epoch: [43] [ 0/31] eta: 0:01:34 lr: 0.000940 loss: 0.0168 (0.0168) time: 3.0562 data: 2.9686 max mem: 5990 Epoch: [43] [10/31] eta: 0:00:08 lr: 0.000940 loss: 0.0237 (0.0293) time: 0.3831 data: 0.3238 max mem: 5990 Epoch: [43] [20/31] eta: 0:00:02 lr: 0.000940 loss: 0.0146 (0.0259) time: 0.1184 data: 0.0626 max mem: 5990 Epoch: [43] [30/31] eta: 0:00:00 lr: 0.000940 loss: 0.0141 (0.0253) time: 0.0880 data: 0.0329 max mem: 5990 Epoch: [43] Total time: 0:00:06 (0.2151 s / it) Averaged stats: lr: 0.000940 loss: 0.0141 (0.0253) Epoch: [44] [ 0/31] eta: 0:01:42 lr: 0.000918 loss: 0.0147 (0.0147) time: 3.2935 data: 3.1996 max mem: 5990 Epoch: [44] [10/31] eta: 0:00:08 lr: 0.000918 loss: 0.0143 (0.0139) time: 0.3853 data: 0.3215 max mem: 5990 Epoch: [44] [20/31] eta: 0:00:02 lr: 0.000918 loss: 0.0115 (0.0171) time: 0.1043 data: 0.0427 max mem: 5990 Epoch: [44] [30/31] eta: 0:00:00 lr: 0.000918 loss: 0.0123 (0.0183) time: 0.0875 data: 0.0259 max mem: 5990 Epoch: [44] Total time: 0:00:06 (0.2175 s / it) Averaged stats: lr: 0.000918 loss: 0.0123 (0.0183) Epoch: [45] [ 0/31] eta: 0:01:37 lr: 0.000895 loss: 0.0113 (0.0113) time: 3.1479 data: 3.0481 max mem: 5990 Epoch: [45] [10/31] eta: 0:00:07 lr: 0.000895 loss: 0.0156 (0.0355) time: 0.3622 data: 0.2962 max mem: 5990 Epoch: [45] [20/31] eta: 0:00:02 lr: 0.000895 loss: 0.0215 (0.0338) time: 0.1071 data: 0.0439 max mem: 5990 Epoch: [45] [30/31] eta: 0:00:00 lr: 0.000895 loss: 0.0215 (0.0316) time: 0.0957 data: 0.0334 max mem: 5990 Epoch: [45] Total time: 0:00:06 (0.2153 s / it) Averaged stats: lr: 0.000895 loss: 0.0215 (0.0316) Epoch: [46] [ 0/31] eta: 0:01:31 lr: 0.000872 loss: 0.0109 (0.0109) time: 2.9549 data: 2.8505 max mem: 5990 Epoch: [46] [10/31] eta: 0:00:07 lr: 0.000872 loss: 0.0113 (0.0152) time: 0.3694 data: 0.2995 max mem: 5990 Epoch: [46] [20/31] eta: 0:00:02 lr: 0.000872 loss: 0.0113 (0.0185) time: 0.1238 data: 0.0590 max mem: 5990 Epoch: [46] [30/31] eta: 0:00:00 lr: 0.000872 loss: 0.0125 (0.0176) time: 0.1152 data: 0.0530 max mem: 5990 Epoch: [46] Total time: 0:00:07 (0.2305 s / it) Averaged stats: lr: 0.000872 loss: 0.0125 (0.0176) Epoch: [47] [ 0/31] eta: 0:01:17 lr: 0.000848 loss: 0.0133 (0.0133) time: 2.4957 data: 2.4238 max mem: 5990 Epoch: [47] [10/31] eta: 0:00:06 lr: 0.000848 loss: 0.0176 (0.0295) time: 0.3298 data: 0.2651 max mem: 5990 Epoch: [47] [20/31] eta: 0:00:02 lr: 0.000848 loss: 0.0129 (0.0260) time: 0.1340 data: 0.0709 max mem: 5990 Epoch: [47] [30/31] eta: 0:00:00 lr: 0.000848 loss: 0.0128 (0.0218) time: 0.1270 data: 0.0645 max mem: 5990 Epoch: [47] Total time: 0:00:07 (0.2294 s / it) Averaged stats: lr: 0.000848 loss: 0.0128 (0.0218) Epoch: [48] [ 0/31] eta: 0:01:08 lr: 0.000825 loss: 0.0100 (0.0100) time: 2.2256 data: 2.1582 max mem: 5990 Epoch: [48] [10/31] eta: 0:00:06 lr: 0.000825 loss: 0.0110 (0.0141) time: 0.3247 data: 0.2621 max mem: 5990 Epoch: [48] [20/31] eta: 0:00:02 lr: 0.000825 loss: 0.0103 (0.0179) time: 0.1522 data: 0.0891 max mem: 5990 Epoch: [48] [30/31] eta: 0:00:00 lr: 0.000825 loss: 0.0113 (0.0197) time: 0.1397 data: 0.0756 max mem: 5990 Epoch: [48] Total time: 0:00:07 (0.2292 s / it) Averaged stats: lr: 0.000825 loss: 0.0113 (0.0197) Epoch: [49] [ 0/31] eta: 0:01:34 lr: 0.000802 loss: 0.0097 (0.0097) time: 3.0617 data: 2.9836 max mem: 5990 Epoch: [49] [10/31] eta: 0:00:08 lr: 0.000802 loss: 0.0132 (0.0183) time: 0.4149 data: 0.3588 max mem: 5990 Epoch: [49] [20/31] eta: 0:00:03 lr: 0.000802 loss: 0.0132 (0.0203) time: 0.1696 data: 0.1110 max mem: 5990 Epoch: [49] [30/31] eta: 0:00:00 lr: 0.000802 loss: 0.0119 (0.0190) time: 0.1420 data: 0.0795 max mem: 5990 Epoch: [49] Total time: 0:00:08 (0.2639 s / it) Averaged stats: lr: 0.000802 loss: 0.0119 (0.0190) Epoch: [50] [ 0/31] eta: 0:01:38 lr: 0.000778 loss: 0.0072 (0.0072) time: 3.1723 data: 3.0932 max mem: 5990 Epoch: [50] [10/31] eta: 0:00:09 lr: 0.000778 loss: 0.0103 (0.0110) time: 0.4454 data: 0.3829 max mem: 5990 Epoch: [50] [20/31] eta: 0:00:03 lr: 0.000778 loss: 0.0102 (0.0141) time: 0.1691 data: 0.1066 max mem: 5990 Epoch: [50] [30/31] eta: 0:00:00 lr: 0.000778 loss: 0.0102 (0.0141) time: 0.1298 data: 0.0672 max mem: 5990 Epoch: [50] Total time: 0:00:08 (0.2675 s / it) Averaged stats: lr: 0.000778 loss: 0.0102 (0.0141) Test: [ 0/96] eta: 0:07:50 loss: 0.2313 (0.2313) acc1: 95.3125 (95.3125) acc5: 98.4375 (98.4375) time: 4.9021 data: 4.8717 max mem: 5990 Test: [10/96] eta: 0:01:06 loss: 0.4858 (0.4876) acc1: 87.5000 (87.6420) acc5: 96.8750 (96.3068) time: 0.7736 data: 0.7434 max mem: 5990 Test: [20/96] eta: 0:00:40 loss: 0.4759 (0.4746) acc1: 85.9375 (87.6488) acc5: 96.8750 (96.7262) time: 0.3109 data: 0.2807 max mem: 5990 Test: [30/96] eta: 0:00:28 loss: 0.4884 (0.4971) acc1: 85.9375 (87.2480) acc5: 96.8750 (96.4718) time: 0.2517 data: 0.2210 max mem: 5990 Test: [40/96] eta: 0:00:23 loss: 0.4499 (0.4725) acc1: 87.5000 (87.4619) acc5: 96.8750 (96.6845) time: 0.2936 data: 0.2628 max mem: 5990 Test: [50/96] eta: 0:00:17 loss: 0.3880 (0.4688) acc1: 89.0625 (87.5000) acc5: 96.8750 (96.6605) time: 0.3182 data: 0.2879 max mem: 5990 Test: [60/96] eta: 0:00:13 loss: 0.3880 (0.4620) acc1: 89.0625 (87.8074) acc5: 96.8750 (96.7469) time: 0.3044 data: 0.2742 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3735 (0.4519) acc1: 90.6250 (88.1162) acc5: 96.8750 (96.7650) time: 0.3840 data: 0.3538 max mem: 5990 Test: [80/96] eta: 0:00:05 loss: 0.4107 (0.4568) acc1: 87.5000 (88.0787) acc5: 96.8750 (96.7400) time: 0.3638 data: 0.3335 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4322 (0.4569) acc1: 87.5000 (88.1868) acc5: 96.8750 (96.7205) time: 0.2703 data: 0.2401 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3713 (0.4449) acc1: 89.0625 (88.3813) acc5: 96.8750 (96.7790) time: 0.2695 data: 0.2401 max mem: 5990 Test: Total time: 0:00:33 (0.3496 s / it)
- Acc@1 88.381 Acc@5 96.779 loss 0.445 Accuracy of the network on the 6085 test images: 88.4% Max accuracy: 89.29% Epoch: [51] [ 0/31] eta: 0:00:53 lr: 0.000755 loss: 0.0098 (0.0098) time: 1.7375 data: 1.6695 max mem: 5990 Epoch: [51] [10/31] eta: 0:00:07 lr: 0.000755 loss: 0.0137 (0.0158) time: 0.3403 data: 0.2727 max mem: 5990 Epoch: [51] [20/31] eta: 0:00:02 lr: 0.000755 loss: 0.0111 (0.0132) time: 0.1838 data: 0.1153 max mem: 5990 Epoch: [51] [30/31] eta: 0:00:00 lr: 0.000755 loss: 0.0132 (0.0155) time: 0.1651 data: 0.1017 max mem: 5990 Epoch: [51] Total time: 0:00:07 (0.2508 s / it) Averaged stats: lr: 0.000755 loss: 0.0132 (0.0155) Epoch: [52] [ 0/31] eta: 0:01:34 lr: 0.000732 loss: 0.0081 (0.0081) time: 3.0583 data: 2.9708 max mem: 5990 Epoch: [52] [10/31] eta: 0:00:09 lr: 0.000732 loss: 0.0105 (0.0110) time: 0.4308 data: 0.3654 max mem: 5990 Epoch: [52] [20/31] eta: 0:00:03 lr: 0.000732 loss: 0.0105 (0.0191) time: 0.1615 data: 0.1017 max mem: 5990 Epoch: [52] [30/31] eta: 0:00:00 lr: 0.000732 loss: 0.0090 (0.0179) time: 0.1038 data: 0.0492 max mem: 5990 Epoch: [52] Total time: 0:00:07 (0.2341 s / it) Averaged stats: lr: 0.000732 loss: 0.0090 (0.0179) Epoch: [53] [ 0/31] eta: 0:01:42 lr: 0.000708 loss: 0.0092 (0.0092) time: 3.2993 data: 3.2060 max mem: 5990 Epoch: [53] [10/31] eta: 0:00:09 lr: 0.000708 loss: 0.0087 (0.0110) time: 0.4497 data: 0.3789 max mem: 5990 Epoch: [53] [20/31] eta: 0:00:03 lr: 0.000708 loss: 0.0083 (0.0109) time: 0.1431 data: 0.0748 max mem: 5990 Epoch: [53] [30/31] eta: 0:00:00 lr: 0.000708 loss: 0.0089 (0.0123) time: 0.0911 data: 0.0267 max mem: 5990 Epoch: [53] Total time: 0:00:07 (0.2440 s / it) Averaged stats: lr: 0.000708 loss: 0.0089 (0.0123) Epoch: [54] [ 0/31] eta: 0:01:36 lr: 0.000685 loss: 0.0090 (0.0090) time: 3.0973 data: 3.0035 max mem: 5990 Epoch: [54] [10/31] eta: 0:00:08 lr: 0.000685 loss: 0.0090 (0.0093) time: 0.4050 data: 0.3390 max mem: 5990 Epoch: [54] [20/31] eta: 0:00:03 lr: 0.000685 loss: 0.0088 (0.0157) time: 0.1325 data: 0.0682 max mem: 5990 Epoch: [54] [30/31] eta: 0:00:00 lr: 0.000685 loss: 0.0101 (0.0151) time: 0.0948 data: 0.0320 max mem: 5990 Epoch: [54] Total time: 0:00:07 (0.2298 s / it) Averaged stats: lr: 0.000685 loss: 0.0101 (0.0151) Epoch: [55] [ 0/31] eta: 0:01:39 lr: 0.000662 loss: 0.0058 (0.0058) time: 3.2128 data: 3.1378 max mem: 5990 Epoch: [55] [10/31] eta: 0:00:08 lr: 0.000662 loss: 0.0087 (0.0107) time: 0.4012 data: 0.3380 max mem: 5990 Epoch: [55] [20/31] eta: 0:00:02 lr: 0.000662 loss: 0.0084 (0.0104) time: 0.1236 data: 0.0612 max mem: 5990 Epoch: [55] [30/31] eta: 0:00:00 lr: 0.000662 loss: 0.0078 (0.0102) time: 0.1131 data: 0.0509 max mem: 5990 Epoch: [55] Total time: 0:00:07 (0.2410 s / it) Averaged stats: lr: 0.000662 loss: 0.0078 (0.0102) Epoch: [56] [ 0/31] eta: 0:01:15 lr: 0.000638 loss: 0.0072 (0.0072) time: 2.4453 data: 2.3546 max mem: 5990 Epoch: [56] [10/31] eta: 0:00:07 lr: 0.000638 loss: 0.0072 (0.0089) time: 0.3441 data: 0.2845 max mem: 5990 Epoch: [56] [20/31] eta: 0:00:02 lr: 0.000638 loss: 0.0081 (0.0094) time: 0.1547 data: 0.0954 max mem: 5990 Epoch: [56] [30/31] eta: 0:00:00 lr: 0.000638 loss: 0.0070 (0.0091) time: 0.1409 data: 0.0771 max mem: 5990 Epoch: [56] Total time: 0:00:07 (0.2388 s / it) Averaged stats: lr: 0.000638 loss: 0.0070 (0.0091) Epoch: [57] [ 0/31] eta: 0:01:09 lr: 0.000615 loss: 0.0065 (0.0065) time: 2.2337 data: 2.1538 max mem: 5990 Epoch: [57] [10/31] eta: 0:00:07 lr: 0.000615 loss: 0.0071 (0.0100) time: 0.3613 data: 0.2965 max mem: 5990 Epoch: [57] [20/31] eta: 0:00:03 lr: 0.000615 loss: 0.0071 (0.0106) time: 0.1874 data: 0.1222 max mem: 5990 Epoch: [57] [30/31] eta: 0:00:00 lr: 0.000615 loss: 0.0085 (0.0135) time: 0.1515 data: 0.0871 max mem: 5990 Epoch: [57] Total time: 0:00:07 (0.2520 s / it) Averaged stats: lr: 0.000615 loss: 0.0085 (0.0135) Epoch: [58] [ 0/31] eta: 0:01:35 lr: 0.000592 loss: 0.0040 (0.0040) time: 3.0887 data: 3.0100 max mem: 5990 Epoch: [58] [10/31] eta: 0:00:09 lr: 0.000592 loss: 0.0075 (0.0170) time: 0.4291 data: 0.3649 max mem: 5990 Epoch: [58] [20/31] eta: 0:00:03 lr: 0.000592 loss: 0.0071 (0.0137) time: 0.1732 data: 0.1090 max mem: 5990 Epoch: [58] [30/31] eta: 0:00:00 lr: 0.000592 loss: 0.0083 (0.0121) time: 0.1386 data: 0.0696 max mem: 5990 Epoch: [58] Total time: 0:00:08 (0.2670 s / it) Averaged stats: lr: 0.000592 loss: 0.0083 (0.0121) Epoch: [59] [ 0/31] eta: 0:01:35 lr: 0.000570 loss: 0.0041 (0.0041) time: 3.0672 data: 2.9870 max mem: 5990 Epoch: [59] [10/31] eta: 0:00:09 lr: 0.000570 loss: 0.0089 (0.0133) time: 0.4346 data: 0.3711 max mem: 5990 Epoch: [59] [20/31] eta: 0:00:03 lr: 0.000570 loss: 0.0105 (0.0189) time: 0.1712 data: 0.1093 max mem: 5990 Epoch: [59] [30/31] eta: 0:00:00 lr: 0.000570 loss: 0.0090 (0.0156) time: 0.1339 data: 0.0722 max mem: 5990 Epoch: [59] Total time: 0:00:08 (0.2666 s / it) Averaged stats: lr: 0.000570 loss: 0.0090 (0.0156) Epoch: [60] [ 0/31] eta: 0:01:35 lr: 0.000547 loss: 0.0104 (0.0104) time: 3.0744 data: 2.9995 max mem: 5990 Epoch: [60] [10/31] eta: 0:00:08 lr: 0.000547 loss: 0.0080 (0.0109) time: 0.4144 data: 0.3510 max mem: 5990 Epoch: [60] [20/31] eta: 0:00:03 lr: 0.000547 loss: 0.0082 (0.0123) time: 0.1660 data: 0.1024 max mem: 5990 Epoch: [60] [30/31] eta: 0:00:00 lr: 0.000547 loss: 0.0097 (0.0113) time: 0.1290 data: 0.0639 max mem: 5990 Epoch: [60] Total time: 0:00:07 (0.2507 s / it) Averaged stats: lr: 0.000547 loss: 0.0097 (0.0113) Test: [ 0/96] eta: 0:06:57 loss: 0.1867 (0.1867) acc1: 93.7500 (93.7500) acc5: 100.0000 (100.0000) time: 4.3471 data: 4.3166 max mem: 5990 Test: [10/96] eta: 0:01:02 loss: 0.4793 (0.4837) acc1: 87.5000 (87.2159) acc5: 95.3125 (96.3068) time: 0.7262 data: 0.6959 max mem: 5990 Test: [20/96] eta: 0:00:38 loss: 0.4793 (0.4712) acc1: 87.5000 (87.6488) acc5: 96.8750 (96.5774) time: 0.3208 data: 0.2904 max mem: 5990 Test: [30/96] eta: 0:00:31 loss: 0.4809 (0.4955) acc1: 87.5000 (87.4496) acc5: 96.8750 (96.2702) time: 0.3336 data: 0.3031 max mem: 5990 Test: [40/96] eta: 0:00:23 loss: 0.4213 (0.4702) acc1: 89.0625 (87.7668) acc5: 96.8750 (96.5701) time: 0.3109 data: 0.2807 max mem: 5990 Test: [50/96] eta: 0:00:18 loss: 0.3780 (0.4699) acc1: 89.0625 (87.7757) acc5: 96.8750 (96.5380) time: 0.3039 data: 0.2736 max mem: 5990 Test: [60/96] eta: 0:00:13 loss: 0.3883 (0.4598) acc1: 90.6250 (88.1404) acc5: 96.8750 (96.5932) time: 0.3339 data: 0.3037 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3641 (0.4507) acc1: 90.6250 (88.4463) acc5: 96.8750 (96.6109) time: 0.3670 data: 0.3367 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.4077 (0.4580) acc1: 89.0625 (88.3295) acc5: 96.8750 (96.6049) time: 0.3648 data: 0.3346 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4535 (0.4563) acc1: 87.5000 (88.5302) acc5: 96.8750 (96.6003) time: 0.1984 data: 0.1681 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3923 (0.4443) acc1: 90.6250 (88.7757) acc5: 98.4375 (96.6639) time: 0.1976 data: 0.1681 max mem: 5990 Test: Total time: 0:00:32 (0.3432 s / it)
- Acc@1 88.776 Acc@5 96.664 loss 0.444 Accuracy of the network on the 6085 test images: 88.8% Max accuracy: 89.29% Epoch: [61] [ 0/31] eta: 0:01:35 lr: 0.000525 loss: 0.0040 (0.0040) time: 3.0941 data: 3.0008 max mem: 5990 Epoch: [61] [10/31] eta: 0:00:08 lr: 0.000525 loss: 0.0062 (0.0110) time: 0.4075 data: 0.3407 max mem: 5990 Epoch: [61] [20/31] eta: 0:00:03 lr: 0.000525 loss: 0.0066 (0.0099) time: 0.1322 data: 0.0693 max mem: 5990 Epoch: [61] [30/31] eta: 0:00:00 lr: 0.000525 loss: 0.0066 (0.0109) time: 0.0934 data: 0.0320 max mem: 5990 Epoch: [61] Total time: 0:00:07 (0.2298 s / it) Averaged stats: lr: 0.000525 loss: 0.0066 (0.0109) Epoch: [62] [ 0/31] eta: 0:01:33 lr: 0.000503 loss: 0.0035 (0.0035) time: 3.0026 data: 2.9231 max mem: 5990 Epoch: [62] [10/31] eta: 0:00:07 lr: 0.000503 loss: 0.0055 (0.0079) time: 0.3574 data: 0.2917 max mem: 5990 Epoch: [62] [20/31] eta: 0:00:02 lr: 0.000503 loss: 0.0065 (0.0096) time: 0.1048 data: 0.0411 max mem: 5990 Epoch: [62] [30/31] eta: 0:00:00 lr: 0.000503 loss: 0.0076 (0.0139) time: 0.1086 data: 0.0452 max mem: 5990 Epoch: [62] Total time: 0:00:06 (0.2236 s / it) Averaged stats: lr: 0.000503 loss: 0.0076 (0.0139) Epoch: [63] [ 0/31] eta: 0:01:25 lr: 0.000481 loss: 0.0091 (0.0091) time: 2.7422 data: 2.6649 max mem: 5990 Epoch: [63] [10/31] eta: 0:00:07 lr: 0.000481 loss: 0.0093 (0.0092) time: 0.3541 data: 0.2915 max mem: 5990 Epoch: [63] [20/31] eta: 0:00:02 lr: 0.000481 loss: 0.0077 (0.0113) time: 0.1268 data: 0.0653 max mem: 5990 Epoch: [63] [30/31] eta: 0:00:00 lr: 0.000481 loss: 0.0067 (0.0105) time: 0.1157 data: 0.0540 max mem: 5990 Epoch: [63] Total time: 0:00:06 (0.2257 s / it) Averaged stats: lr: 0.000481 loss: 0.0067 (0.0105) Epoch: [64] [ 0/31] eta: 0:01:10 lr: 0.000459 loss: 0.0108 (0.0108) time: 2.2844 data: 2.2115 max mem: 5990 Epoch: [64] [10/31] eta: 0:00:06 lr: 0.000459 loss: 0.0087 (0.0152) time: 0.3219 data: 0.2588 max mem: 5990 Epoch: [64] [20/31] eta: 0:00:02 lr: 0.000459 loss: 0.0087 (0.0126) time: 0.1577 data: 0.0948 max mem: 5990 Epoch: [64] [30/31] eta: 0:00:00 lr: 0.000459 loss: 0.0066 (0.0134) time: 0.1502 data: 0.0874 max mem: 5990 Epoch: [64] Total time: 0:00:07 (0.2352 s / it) Averaged stats: lr: 0.000459 loss: 0.0066 (0.0134) Epoch: [65] [ 0/31] eta: 0:00:51 lr: 0.000438 loss: 0.0136 (0.0136) time: 1.6594 data: 1.5836 max mem: 5990 Epoch: [65] [10/31] eta: 0:00:06 lr: 0.000438 loss: 0.0062 (0.0085) time: 0.3064 data: 0.2420 max mem: 5990 Epoch: [65] [20/31] eta: 0:00:02 lr: 0.000438 loss: 0.0062 (0.0078) time: 0.1715 data: 0.1088 max mem: 5990 Epoch: [65] [30/31] eta: 0:00:00 lr: 0.000438 loss: 0.0064 (0.0075) time: 0.1710 data: 0.1083 max mem: 5990 Epoch: [65] Total time: 0:00:07 (0.2355 s / it) Averaged stats: lr: 0.000438 loss: 0.0064 (0.0075) Epoch: [66] [ 0/31] eta: 0:01:25 lr: 0.000417 loss: 0.0055 (0.0055) time: 2.7548 data: 2.6797 max mem: 5990 Epoch: [66] [10/31] eta: 0:00:08 lr: 0.000417 loss: 0.0069 (0.0145) time: 0.4007 data: 0.3371 max mem: 5990 Epoch: [66] [20/31] eta: 0:00:03 lr: 0.000417 loss: 0.0052 (0.0104) time: 0.1742 data: 0.1120 max mem: 5990 Epoch: [66] [30/31] eta: 0:00:00 lr: 0.000417 loss: 0.0058 (0.0097) time: 0.1391 data: 0.0781 max mem: 5990 Epoch: [66] Total time: 0:00:07 (0.2573 s / it) Averaged stats: lr: 0.000417 loss: 0.0058 (0.0097) Epoch: [67] [ 0/31] eta: 0:01:35 lr: 0.000396 loss: 0.0082 (0.0082) time: 3.0880 data: 3.0084 max mem: 5990 Epoch: [67] [10/31] eta: 0:00:08 lr: 0.000396 loss: 0.0082 (0.0271) time: 0.4222 data: 0.3565 max mem: 5990 Epoch: [67] [20/31] eta: 0:00:03 lr: 0.000396 loss: 0.0063 (0.0172) time: 0.1695 data: 0.1050 max mem: 5990 Epoch: [67] [30/31] eta: 0:00:00 lr: 0.000396 loss: 0.0065 (0.0141) time: 0.1399 data: 0.0766 max mem: 5990 Epoch: [67] Total time: 0:00:08 (0.2662 s / it) Averaged stats: lr: 0.000396 loss: 0.0065 (0.0141) Epoch: [68] [ 0/31] eta: 0:01:35 lr: 0.000376 loss: 0.0062 (0.0062) time: 3.0840 data: 3.0090 max mem: 5990 Epoch: [68] [10/31] eta: 0:00:09 lr: 0.000376 loss: 0.0062 (0.0076) time: 0.4341 data: 0.3711 max mem: 5990 Epoch: [68] [20/31] eta: 0:00:03 lr: 0.000376 loss: 0.0078 (0.0138) time: 0.1747 data: 0.1133 max mem: 5990 Epoch: [68] [30/31] eta: 0:00:00 lr: 0.000376 loss: 0.0065 (0.0114) time: 0.1365 data: 0.0757 max mem: 5990 Epoch: [68] Total time: 0:00:08 (0.2649 s / it) Averaged stats: lr: 0.000376 loss: 0.0065 (0.0114) Epoch: [69] [ 0/31] eta: 0:01:36 lr: 0.000356 loss: 0.0109 (0.0109) time: 3.1225 data: 3.0467 max mem: 5990 Epoch: [69] [10/31] eta: 0:00:09 lr: 0.000356 loss: 0.0078 (0.0079) time: 0.4360 data: 0.3744 max mem: 5990 Epoch: [69] [20/31] eta: 0:00:03 lr: 0.000356 loss: 0.0062 (0.0096) time: 0.1614 data: 0.0998 max mem: 5990 Epoch: [69] [30/31] eta: 0:00:00 lr: 0.000356 loss: 0.0066 (0.0089) time: 0.1101 data: 0.0485 max mem: 5990 Epoch: [69] Total time: 0:00:07 (0.2419 s / it) Averaged stats: lr: 0.000356 loss: 0.0066 (0.0089) Epoch: [70] [ 0/31] eta: 0:00:49 lr: 0.000336 loss: 0.0056 (0.0056) time: 1.5838 data: 1.5146 max mem: 5990 Epoch: [70] [10/31] eta: 0:00:06 lr: 0.000336 loss: 0.0077 (0.0088) time: 0.3100 data: 0.2464 max mem: 5990 Epoch: [70] [20/31] eta: 0:00:02 lr: 0.000336 loss: 0.0077 (0.0092) time: 0.1749 data: 0.1119 max mem: 5990 Epoch: [70] [30/31] eta: 0:00:00 lr: 0.000336 loss: 0.0060 (0.0085) time: 0.1695 data: 0.1060 max mem: 5990 Epoch: [70] Total time: 0:00:07 (0.2408 s / it) Averaged stats: lr: 0.000336 loss: 0.0060 (0.0085) Test: [ 0/96] eta: 0:08:06 loss: 0.2164 (0.2164) acc1: 93.7500 (93.7500) acc5: 98.4375 (98.4375) time: 5.0693 data: 5.0277 max mem: 5990 Test: [10/96] eta: 0:01:01 loss: 0.4552 (0.4557) acc1: 89.0625 (88.2102) acc5: 96.8750 (96.8750) time: 0.7153 data: 0.6814 max mem: 5990 Test: [20/96] eta: 0:00:41 loss: 0.4552 (0.4410) acc1: 89.0625 (88.6161) acc5: 96.8750 (96.8750) time: 0.3199 data: 0.2881 max mem: 5990 Test: [30/96] eta: 0:00:30 loss: 0.4656 (0.4607) acc1: 87.5000 (88.1552) acc5: 96.8750 (96.7742) time: 0.3333 data: 0.3025 max mem: 5990 Test: [40/96] eta: 0:00:23 loss: 0.4030 (0.4314) acc1: 89.0625 (88.9101) acc5: 98.4375 (97.1418) time: 0.3010 data: 0.2701 max mem: 5990 Test: [50/96] eta: 0:00:19 loss: 0.3447 (0.4319) acc1: 90.6250 (89.0012) acc5: 96.8750 (96.9975) time: 0.3354 data: 0.3040 max mem: 5990 Test: [60/96] eta: 0:00:13 loss: 0.3833 (0.4259) acc1: 89.0625 (89.2162) acc5: 96.8750 (97.1055) time: 0.3066 data: 0.2747 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3302 (0.4166) acc1: 90.6250 (89.4146) acc5: 98.4375 (97.1391) time: 0.3583 data: 0.3272 max mem: 5990 Test: [80/96] eta: 0:00:06 loss: 0.3834 (0.4225) acc1: 90.6250 (89.2747) acc5: 96.8750 (97.1065) time: 0.3574 data: 0.3270 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4139 (0.4223) acc1: 90.6250 (89.3372) acc5: 96.8750 (97.0467) time: 0.2473 data: 0.2169 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3682 (0.4109) acc1: 92.1875 (89.5152) acc5: 96.8750 (97.0912) time: 0.2516 data: 0.2216 max mem: 5990 Test: Total time: 0:00:34 (0.3578 s / it)
- Acc@1 89.515 Acc@5 97.091 loss 0.411 Accuracy of the network on the 6085 test images: 89.5% Max accuracy: 89.52% Epoch: [71] [ 0/31] eta: 0:01:17 lr: 0.000317 loss: 0.0050 (0.0050) time: 2.5148 data: 2.4258 max mem: 5990 Epoch: [71] [10/31] eta: 0:00:07 lr: 0.000317 loss: 0.0079 (0.0065) time: 0.3771 data: 0.3002 max mem: 5990 Epoch: [71] [20/31] eta: 0:00:03 lr: 0.000317 loss: 0.0057 (0.0063) time: 0.1749 data: 0.0990 max mem: 5990 Epoch: [71] [30/31] eta: 0:00:00 lr: 0.000317 loss: 0.0050 (0.0060) time: 0.1367 data: 0.0622 max mem: 5990 Epoch: [71] Total time: 0:00:07 (0.2356 s / it) Averaged stats: lr: 0.000317 loss: 0.0050 (0.0060) Epoch: [72] [ 0/31] eta: 0:01:21 lr: 0.000298 loss: 0.0072 (0.0072) time: 2.6181 data: 2.5238 max mem: 5990 Epoch: [72] [10/31] eta: 0:00:08 lr: 0.000298 loss: 0.0056 (0.0074) time: 0.3975 data: 0.3201 max mem: 5990 Epoch: [72] [20/31] eta: 0:00:03 lr: 0.000298 loss: 0.0054 (0.0081) time: 0.1761 data: 0.1008 max mem: 5990 Epoch: [72] [30/31] eta: 0:00:00 lr: 0.000298 loss: 0.0050 (0.0105) time: 0.1375 data: 0.0621 max mem: 5990 Epoch: [72] Total time: 0:00:07 (0.2522 s / it) Averaged stats: lr: 0.000298 loss: 0.0050 (0.0105) Epoch: [73] [ 0/31] eta: 0:01:36 lr: 0.000280 loss: 0.0089 (0.0089) time: 3.1068 data: 3.0309 max mem: 5990 Epoch: [73] [10/31] eta: 0:00:09 lr: 0.000280 loss: 0.0084 (0.0075) time: 0.4289 data: 0.3655 max mem: 5990 Epoch: [73] [20/31] eta: 0:00:03 lr: 0.000280 loss: 0.0063 (0.0083) time: 0.1545 data: 0.0922 max mem: 5990 Epoch: [73] [30/31] eta: 0:00:00 lr: 0.000280 loss: 0.0054 (0.0087) time: 0.1093 data: 0.0465 max mem: 5990 Epoch: [73] Total time: 0:00:07 (0.2449 s / it) Averaged stats: lr: 0.000280 loss: 0.0054 (0.0087) Epoch: [74] [ 0/31] eta: 0:01:35 lr: 0.000262 loss: 0.0110 (0.0110) time: 3.0858 data: 2.9979 max mem: 5990 Epoch: [74] [10/31] eta: 0:00:08 lr: 0.000262 loss: 0.0052 (0.0100) time: 0.4196 data: 0.3542 max mem: 5990 Epoch: [74] [20/31] eta: 0:00:03 lr: 0.000262 loss: 0.0045 (0.0073) time: 0.1409 data: 0.0750 max mem: 5990 Epoch: [74] [30/31] eta: 0:00:00 lr: 0.000262 loss: 0.0043 (0.0071) time: 0.0932 data: 0.0301 max mem: 5990 Epoch: [74] Total time: 0:00:07 (0.2329 s / it) Averaged stats: lr: 0.000262 loss: 0.0043 (0.0071) Epoch: [75] [ 0/31] eta: 0:01:36 lr: 0.000245 loss: 0.0045 (0.0045) time: 3.1166 data: 3.0300 max mem: 5990 Epoch: [75] [10/31] eta: 0:00:08 lr: 0.000245 loss: 0.0057 (0.0095) time: 0.4217 data: 0.3561 max mem: 5990 Epoch: [75] [20/31] eta: 0:00:03 lr: 0.000245 loss: 0.0050 (0.0114) time: 0.1651 data: 0.1017 max mem: 5990 Epoch: [75] [30/31] eta: 0:00:00 lr: 0.000245 loss: 0.0048 (0.0101) time: 0.1424 data: 0.0814 max mem: 5990 Epoch: [75] Total time: 0:00:08 (0.2663 s / it) Averaged stats: lr: 0.000245 loss: 0.0048 (0.0101) Epoch: [76] [ 0/31] eta: 0:01:33 lr: 0.000228 loss: 0.0042 (0.0042) time: 3.0015 data: 2.9266 max mem: 5990 Epoch: [76] [10/31] eta: 0:00:09 lr: 0.000228 loss: 0.0053 (0.0151) time: 0.4331 data: 0.3698 max mem: 5990 Epoch: [76] [20/31] eta: 0:00:03 lr: 0.000228 loss: 0.0067 (0.0124) time: 0.1765 data: 0.1156 max mem: 5990 Epoch: [76] [30/31] eta: 0:00:00 lr: 0.000228 loss: 0.0067 (0.0128) time: 0.1354 data: 0.0747 max mem: 5990 Epoch: [76] Total time: 0:00:08 (0.2649 s / it) Averaged stats: lr: 0.000228 loss: 0.0067 (0.0128) Epoch: [77] [ 0/31] eta: 0:01:36 lr: 0.000212 loss: 0.0058 (0.0058) time: 3.1050 data: 3.0319 max mem: 5990 Epoch: [77] [10/31] eta: 0:00:09 lr: 0.000212 loss: 0.0067 (0.0066) time: 0.4348 data: 0.3760 max mem: 5990 Epoch: [77] [20/31] eta: 0:00:03 lr: 0.000212 loss: 0.0063 (0.0079) time: 0.1708 data: 0.1090 max mem: 5990 Epoch: [77] [30/31] eta: 0:00:00 lr: 0.000212 loss: 0.0048 (0.0069) time: 0.1386 data: 0.0750 max mem: 5990 Epoch: [77] Total time: 0:00:08 (0.2663 s / it) Averaged stats: lr: 0.000212 loss: 0.0048 (0.0069) Epoch: [78] [ 0/31] eta: 0:01:26 lr: 0.000196 loss: 0.0052 (0.0052) time: 2.7938 data: 2.7248 max mem: 5990 Epoch: [78] [10/31] eta: 0:00:08 lr: 0.000196 loss: 0.0038 (0.0043) time: 0.3916 data: 0.3361 max mem: 5990 Epoch: [78] [20/31] eta: 0:00:02 lr: 0.000196 loss: 0.0039 (0.0159) time: 0.1341 data: 0.0800 max mem: 5990 Epoch: [78] [30/31] eta: 0:00:00 lr: 0.000196 loss: 0.0061 (0.0141) time: 0.0848 data: 0.0314 max mem: 5990 Epoch: [78] Total time: 0:00:06 (0.2124 s / it) Averaged stats: lr: 0.000196 loss: 0.0061 (0.0141) Epoch: [79] [ 0/31] eta: 0:01:29 lr: 0.000181 loss: 0.0041 (0.0041) time: 2.8878 data: 2.8045 max mem: 5990 Epoch: [79] [10/31] eta: 0:00:08 lr: 0.000181 loss: 0.0059 (0.0080) time: 0.4225 data: 0.3577 max mem: 5990 Epoch: [79] [20/31] eta: 0:00:03 lr: 0.000181 loss: 0.0056 (0.0072) time: 0.1649 data: 0.1010 max mem: 5990 Epoch: [79] [30/31] eta: 0:00:00 lr: 0.000181 loss: 0.0051 (0.0114) time: 0.1072 data: 0.0445 max mem: 5990 Epoch: [79] Total time: 0:00:07 (0.2377 s / it) Averaged stats: lr: 0.000181 loss: 0.0051 (0.0114) Epoch: [80] [ 0/31] eta: 0:01:31 lr: 0.000166 loss: 0.0069 (0.0069) time: 2.9463 data: 2.8639 max mem: 5990 Epoch: [80] [10/31] eta: 0:00:08 lr: 0.000166 loss: 0.0053 (0.0057) time: 0.4197 data: 0.3541 max mem: 5990 Epoch: [80] [20/31] eta: 0:00:02 lr: 0.000166 loss: 0.0047 (0.0056) time: 0.1339 data: 0.0697 max mem: 5990 Epoch: [80] [30/31] eta: 0:00:00 lr: 0.000166 loss: 0.0043 (0.0068) time: 0.0807 data: 0.0182 max mem: 5990 Epoch: [80] Total time: 0:00:07 (0.2306 s / it) Averaged stats: lr: 0.000166 loss: 0.0043 (0.0068) Test: [ 0/96] eta: 0:07:16 loss: 0.2601 (0.2601) acc1: 92.1875 (92.1875) acc5: 96.8750 (96.8750) time: 4.5476 data: 4.5136 max mem: 5990 Test: [10/96] eta: 0:01:02 loss: 0.4632 (0.4555) acc1: 87.5000 (88.4943) acc5: 96.8750 (97.0170) time: 0.7220 data: 0.6911 max mem: 5990 Test: [20/96] eta: 0:00:42 loss: 0.4546 (0.4474) acc1: 87.5000 (88.6161) acc5: 96.8750 (97.0982) time: 0.3547 data: 0.3244 max mem: 5990 Test: [30/96] eta: 0:00:30 loss: 0.4320 (0.4683) acc1: 87.5000 (88.5081) acc5: 96.8750 (96.9254) time: 0.3142 data: 0.2840 max mem: 5990 Test: [40/96] eta: 0:00:24 loss: 0.3852 (0.4405) acc1: 89.0625 (89.1006) acc5: 96.8750 (97.2942) time: 0.2996 data: 0.2694 max mem: 5990 Test: [50/96] eta: 0:00:18 loss: 0.3697 (0.4396) acc1: 90.6250 (89.0319) acc5: 98.4375 (97.2733) time: 0.3314 data: 0.3011 max mem: 5990 Test: [60/96] eta: 0:00:14 loss: 0.3812 (0.4355) acc1: 90.6250 (89.0369) acc5: 96.8750 (97.3361) time: 0.3050 data: 0.2747 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3530 (0.4249) acc1: 92.1875 (89.3486) acc5: 96.8750 (97.3151) time: 0.3416 data: 0.3113 max mem: 5990 Test: [80/96] eta: 0:00:05 loss: 0.3464 (0.4275) acc1: 90.6250 (89.2554) acc5: 96.8750 (97.3187) time: 0.3219 data: 0.2917 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4119 (0.4290) acc1: 90.6250 (89.2857) acc5: 96.8750 (97.2184) time: 0.2647 data: 0.2345 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3219 (0.4178) acc1: 92.1875 (89.4495) acc5: 98.4375 (97.2555) time: 0.2732 data: 0.2438 max mem: 5990 Test: Total time: 0:00:33 (0.3512 s / it)
- Acc@1 89.449 Acc@5 97.256 loss 0.418 Accuracy of the network on the 6085 test images: 89.4% Max accuracy: 89.52% Epoch: [81] [ 0/31] eta: 0:01:00 lr: 0.000152 loss: 0.0033 (0.0033) time: 1.9536 data: 1.8778 max mem: 5990 Epoch: [81] [10/31] eta: 0:00:07 lr: 0.000152 loss: 0.0079 (0.0114) time: 0.3409 data: 0.2770 max mem: 5990 Epoch: [81] [20/31] eta: 0:00:02 lr: 0.000152 loss: 0.0061 (0.0099) time: 0.1721 data: 0.1093 max mem: 5990 Epoch: [81] [30/31] eta: 0:00:00 lr: 0.000152 loss: 0.0043 (0.0140) time: 0.1666 data: 0.1034 max mem: 5990 Epoch: [81] Total time: 0:00:07 (0.2499 s / it) Averaged stats: lr: 0.000152 loss: 0.0043 (0.0140) Epoch: [82] [ 0/31] eta: 0:01:37 lr: 0.000139 loss: 0.0047 (0.0047) time: 3.1502 data: 3.0585 max mem: 5990 Epoch: [82] [10/31] eta: 0:00:09 lr: 0.000139 loss: 0.0047 (0.0064) time: 0.4325 data: 0.3660 max mem: 5990 Epoch: [82] [20/31] eta: 0:00:03 lr: 0.000139 loss: 0.0046 (0.0101) time: 0.1590 data: 0.0954 max mem: 5990 Epoch: [82] [30/31] eta: 0:00:00 lr: 0.000139 loss: 0.0045 (0.0096) time: 0.1084 data: 0.0471 max mem: 5990 Epoch: [82] Total time: 0:00:07 (0.2423 s / it) Averaged stats: lr: 0.000139 loss: 0.0045 (0.0096) Epoch: [83] [ 0/31] eta: 0:01:36 lr: 0.000126 loss: 0.0058 (0.0058) time: 3.0970 data: 3.0211 max mem: 5990 Epoch: [83] [10/31] eta: 0:00:08 lr: 0.000126 loss: 0.0055 (0.0080) time: 0.4176 data: 0.3549 max mem: 5990 Epoch: [83] [20/31] eta: 0:00:02 lr: 0.000126 loss: 0.0054 (0.0070) time: 0.1300 data: 0.0676 max mem: 5990 Epoch: [83] [30/31] eta: 0:00:00 lr: 0.000126 loss: 0.0057 (0.0085) time: 0.0864 data: 0.0239 max mem: 5990 Epoch: [83] Total time: 0:00:07 (0.2282 s / it) Averaged stats: lr: 0.000126 loss: 0.0057 (0.0085) Epoch: [84] [ 0/31] eta: 0:01:40 lr: 0.000114 loss: 0.0079 (0.0079) time: 3.2559 data: 3.1820 max mem: 5990 Epoch: [84] [10/31] eta: 0:00:09 lr: 0.000114 loss: 0.0065 (0.0068) time: 0.4449 data: 0.3821 max mem: 5990 Epoch: [84] [20/31] eta: 0:00:03 lr: 0.000114 loss: 0.0052 (0.0061) time: 0.1653 data: 0.1037 max mem: 5990 Epoch: [84] [30/31] eta: 0:00:00 lr: 0.000114 loss: 0.0047 (0.0059) time: 0.1292 data: 0.0680 max mem: 5990 Epoch: [84] Total time: 0:00:08 (0.2646 s / it) Averaged stats: lr: 0.000114 loss: 0.0047 (0.0059) Epoch: [85] [ 0/31] eta: 0:01:35 lr: 0.000102 loss: 0.0046 (0.0046) time: 3.0813 data: 2.9957 max mem: 5990 Epoch: [85] [10/31] eta: 0:00:08 lr: 0.000102 loss: 0.0045 (0.0059) time: 0.4237 data: 0.3585 max mem: 5990 Epoch: [85] [20/31] eta: 0:00:03 lr: 0.000102 loss: 0.0047 (0.0063) time: 0.1667 data: 0.1034 max mem: 5990 Epoch: [85] [30/31] eta: 0:00:00 lr: 0.000102 loss: 0.0054 (0.0066) time: 0.1376 data: 0.0749 max mem: 5990 Epoch: [85] Total time: 0:00:08 (0.2633 s / it) Averaged stats: lr: 0.000102 loss: 0.0054 (0.0066) Epoch: [86] [ 0/31] eta: 0:01:32 lr: 0.000091 loss: 0.0157 (0.0157) time: 2.9781 data: 2.8861 max mem: 5990 Epoch: [86] [10/31] eta: 0:00:09 lr: 0.000091 loss: 0.0061 (0.0084) time: 0.4332 data: 0.3674 max mem: 5990 Epoch: [86] [20/31] eta: 0:00:03 lr: 0.000091 loss: 0.0051 (0.0078) time: 0.1787 data: 0.1156 max mem: 5990 Epoch: [86] [30/31] eta: 0:00:00 lr: 0.000091 loss: 0.0057 (0.0122) time: 0.1358 data: 0.0740 max mem: 5990 Epoch: [86] Total time: 0:00:08 (0.2651 s / it) Averaged stats: lr: 0.000091 loss: 0.0057 (0.0122) Epoch: [87] [ 0/31] eta: 0:01:28 lr: 0.000081 loss: 0.0043 (0.0043) time: 2.8468 data: 2.7676 max mem: 5990 Epoch: [87] [10/31] eta: 0:00:08 lr: 0.000081 loss: 0.0042 (0.0048) time: 0.3883 data: 0.3289 max mem: 5990 Epoch: [87] [20/31] eta: 0:00:02 lr: 0.000081 loss: 0.0042 (0.0050) time: 0.1369 data: 0.0807 max mem: 5990 Epoch: [87] [30/31] eta: 0:00:00 lr: 0.000081 loss: 0.0047 (0.0065) time: 0.0923 data: 0.0387 max mem: 5990 Epoch: [87] Total time: 0:00:06 (0.2198 s / it) Averaged stats: lr: 0.000081 loss: 0.0047 (0.0065) Epoch: [88] [ 0/31] eta: 0:01:32 lr: 0.000071 loss: 0.0042 (0.0042) time: 2.9968 data: 2.9037 max mem: 5990 Epoch: [88] [10/31] eta: 0:00:08 lr: 0.000071 loss: 0.0042 (0.0064) time: 0.4191 data: 0.3532 max mem: 5990 Epoch: [88] [20/31] eta: 0:00:03 lr: 0.000071 loss: 0.0044 (0.0066) time: 0.1613 data: 0.0967 max mem: 5990 Epoch: [88] [30/31] eta: 0:00:00 lr: 0.000071 loss: 0.0044 (0.0062) time: 0.1104 data: 0.0476 max mem: 5990 Epoch: [88] Total time: 0:00:07 (0.2419 s / it) Averaged stats: lr: 0.000071 loss: 0.0044 (0.0062) Epoch: [89] [ 0/31] eta: 0:01:32 lr: 0.000062 loss: 0.0034 (0.0034) time: 2.9678 data: 2.8876 max mem: 5990 Epoch: [89] [10/31] eta: 0:00:08 lr: 0.000062 loss: 0.0049 (0.0057) time: 0.4246 data: 0.3620 max mem: 5990 Epoch: [89] [20/31] eta: 0:00:03 lr: 0.000062 loss: 0.0045 (0.0054) time: 0.1403 data: 0.0789 max mem: 5990 Epoch: [89] [30/31] eta: 0:00:00 lr: 0.000062 loss: 0.0053 (0.0059) time: 0.0863 data: 0.0242 max mem: 5990 Epoch: [89] Total time: 0:00:07 (0.2310 s / it) Averaged stats: lr: 0.000062 loss: 0.0053 (0.0059) Epoch: [90] [ 0/31] eta: 0:01:42 lr: 0.000054 loss: 0.0133 (0.0133) time: 3.3077 data: 3.2278 max mem: 5990 Epoch: [90] [10/31] eta: 0:00:08 lr: 0.000054 loss: 0.0053 (0.0147) time: 0.4186 data: 0.3615 max mem: 5990 Epoch: [90] [20/31] eta: 0:00:03 lr: 0.000054 loss: 0.0077 (0.0123) time: 0.1224 data: 0.0683 max mem: 5990 Epoch: [90] [30/31] eta: 0:00:00 lr: 0.000054 loss: 0.0069 (0.0106) time: 0.0846 data: 0.0309 max mem: 5990 Epoch: [90] Total time: 0:00:07 (0.2286 s / it) Averaged stats: lr: 0.000054 loss: 0.0069 (0.0106) Test: [ 0/96] eta: 0:07:03 loss: 0.2022 (0.2022) acc1: 95.3125 (95.3125) acc5: 98.4375 (98.4375) time: 4.4107 data: 4.3794 max mem: 5990 Test: [10/96] eta: 0:01:01 loss: 0.4827 (0.4729) acc1: 87.5000 (88.4943) acc5: 96.8750 (96.8750) time: 0.7169 data: 0.6864 max mem: 5990 Test: [20/96] eta: 0:00:41 loss: 0.4756 (0.4565) acc1: 87.5000 (88.8393) acc5: 96.8750 (96.9494) time: 0.3466 data: 0.3162 max mem: 5990 Test: [30/96] eta: 0:00:29 loss: 0.4756 (0.4740) acc1: 87.5000 (88.5081) acc5: 96.8750 (96.8750) time: 0.3099 data: 0.2796 max mem: 5990 Test: [40/96] eta: 0:00:24 loss: 0.3847 (0.4464) acc1: 89.0625 (88.7576) acc5: 98.4375 (97.2180) time: 0.3160 data: 0.2857 max mem: 5990 Test: [50/96] eta: 0:00:18 loss: 0.3451 (0.4456) acc1: 90.6250 (88.9400) acc5: 98.4375 (97.1814) time: 0.3219 data: 0.2916 max mem: 5990 Test: [60/96] eta: 0:00:14 loss: 0.3862 (0.4412) acc1: 90.6250 (89.1137) acc5: 96.8750 (97.2592) time: 0.3108 data: 0.2806 max mem: 5990 Test: [70/96] eta: 0:00:10 loss: 0.3692 (0.4299) acc1: 90.6250 (89.3046) acc5: 96.8750 (97.2271) time: 0.3495 data: 0.3192 max mem: 5990 Test: [80/96] eta: 0:00:05 loss: 0.3801 (0.4340) acc1: 87.5000 (89.1782) acc5: 96.8750 (97.1836) time: 0.3204 data: 0.2902 max mem: 5990 Test: [90/96] eta: 0:00:02 loss: 0.4187 (0.4326) acc1: 89.0625 (89.2857) acc5: 96.8750 (97.1326) time: 0.2738 data: 0.2436 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3719 (0.4222) acc1: 90.6250 (89.4823) acc5: 96.8750 (97.1569) time: 0.2731 data: 0.2436 max mem: 5990 Test: Total time: 0:00:33 (0.3511 s / it)
- Acc@1 89.482 Acc@5 97.157 loss 0.422 Accuracy of the network on the 6085 test images: 89.5% Max accuracy: 89.52% Epoch: [91] [ 0/31] eta: 0:01:34 lr: 0.000046 loss: 0.0046 (0.0046) time: 3.0523 data: 2.9665 max mem: 5990 Epoch: [91] [10/31] eta: 0:00:09 lr: 0.000046 loss: 0.0051 (0.0109) time: 0.4340 data: 0.3678 max mem: 5990 Epoch: [91] [20/31] eta: 0:00:03 lr: 0.000046 loss: 0.0056 (0.0099) time: 0.1712 data: 0.1071 max mem: 5990 Epoch: [91] [30/31] eta: 0:00:00 lr: 0.000046 loss: 0.0058 (0.0086) time: 0.1150 data: 0.0532 max mem: 5990 Epoch: [91] Total time: 0:00:07 (0.2465 s / it) Averaged stats: lr: 0.000046 loss: 0.0058 (0.0086) Epoch: [92] [ 0/31] eta: 0:01:33 lr: 0.000040 loss: 0.0046 (0.0046) time: 3.0078 data: 2.9115 max mem: 5990 Epoch: [92] [10/31] eta: 0:00:08 lr: 0.000040 loss: 0.0054 (0.0067) time: 0.4165 data: 0.3518 max mem: 5990 Epoch: [92] [20/31] eta: 0:00:03 lr: 0.000040 loss: 0.0054 (0.0087) time: 0.1706 data: 0.1083 max mem: 5990 Epoch: [92] [30/31] eta: 0:00:00 lr: 0.000040 loss: 0.0055 (0.0107) time: 0.1394 data: 0.0770 max mem: 5990 Epoch: [92] Total time: 0:00:08 (0.2624 s / it) Averaged stats: lr: 0.000040 loss: 0.0055 (0.0107) Epoch: [93] [ 0/31] eta: 0:01:40 lr: 0.000033 loss: 0.0041 (0.0041) time: 3.2274 data: 3.1477 max mem: 5990 Epoch: [93] [10/31] eta: 0:00:09 lr: 0.000033 loss: 0.0070 (0.0266) time: 0.4410 data: 0.3747 max mem: 5990 Epoch: [93] [20/31] eta: 0:00:03 lr: 0.000033 loss: 0.0053 (0.0169) time: 0.1680 data: 0.1048 max mem: 5990 Epoch: [93] [30/31] eta: 0:00:00 lr: 0.000033 loss: 0.0042 (0.0136) time: 0.1335 data: 0.0728 max mem: 5990 Epoch: [93] Total time: 0:00:08 (0.2678 s / it) Averaged stats: lr: 0.000033 loss: 0.0042 (0.0136) Epoch: [94] [ 0/31] eta: 0:01:42 lr: 0.000028 loss: 0.0069 (0.0069) time: 3.2985 data: 3.2197 max mem: 5990 Epoch: [94] [10/31] eta: 0:00:09 lr: 0.000028 loss: 0.0055 (0.0059) time: 0.4445 data: 0.3804 max mem: 5990 Epoch: [94] [20/31] eta: 0:00:03 lr: 0.000028 loss: 0.0053 (0.0058) time: 0.1676 data: 0.1049 max mem: 5990 Epoch: [94] [30/31] eta: 0:00:00 lr: 0.000028 loss: 0.0054 (0.0073) time: 0.1336 data: 0.0717 max mem: 5990 Epoch: [94] Total time: 0:00:08 (0.2687 s / it) Averaged stats: lr: 0.000028 loss: 0.0054 (0.0073) Epoch: [95] [ 0/31] eta: 0:01:37 lr: 0.000023 loss: 0.0068 (0.0068) time: 3.1340 data: 3.0329 max mem: 5990 Epoch: [95] [10/31] eta: 0:00:08 lr: 0.000023 loss: 0.0057 (0.0126) time: 0.4237 data: 0.3557 max mem: 5990 Epoch: [95] [20/31] eta: 0:00:03 lr: 0.000023 loss: 0.0052 (0.0106) time: 0.1465 data: 0.0822 max mem: 5990 Epoch: [95] [30/31] eta: 0:00:00 lr: 0.000023 loss: 0.0039 (0.0084) time: 0.1003 data: 0.0383 max mem: 5990 Epoch: [95] Total time: 0:00:07 (0.2385 s / it) Averaged stats: lr: 0.000023 loss: 0.0039 (0.0084) Epoch: [96] [ 0/31] eta: 0:00:48 lr: 0.000019 loss: 0.0038 (0.0038) time: 1.5496 data: 1.4845 max mem: 5990 Epoch: [96] [10/31] eta: 0:00:04 lr: 0.000019 loss: 0.0042 (0.0045) time: 0.2222 data: 0.1652 max mem: 5990 Epoch: [96] [20/31] eta: 0:00:01 lr: 0.000019 loss: 0.0050 (0.0054) time: 0.0884 data: 0.0283 max mem: 5990 Epoch: [96] [30/31] eta: 0:00:00 lr: 0.000019 loss: 0.0051 (0.0056) time: 0.0876 data: 0.0236 max mem: 5990 Epoch: [96] Total time: 0:00:04 (0.1488 s / it) Averaged stats: lr: 0.000019 loss: 0.0051 (0.0056) Epoch: [97] [ 0/31] eta: 0:00:41 lr: 0.000016 loss: 0.0038 (0.0038) time: 1.3426 data: 1.2667 max mem: 5990 Epoch: [97] [10/31] eta: 0:00:04 lr: 0.000016 loss: 0.0045 (0.0068) time: 0.2307 data: 0.1644 max mem: 5990 Epoch: [97] [20/31] eta: 0:00:01 lr: 0.000016 loss: 0.0045 (0.0070) time: 0.1012 data: 0.0379 max mem: 5990 Epoch: [97] [30/31] eta: 0:00:00 lr: 0.000016 loss: 0.0046 (0.0063) time: 0.0837 data: 0.0221 max mem: 5990 Epoch: [97] Total time: 0:00:04 (0.1493 s / it) Averaged stats: lr: 0.000016 loss: 0.0046 (0.0063) Epoch: [98] [ 0/31] eta: 0:00:40 lr: 0.000013 loss: 0.0043 (0.0043) time: 1.3173 data: 1.2466 max mem: 5990 Epoch: [98] [10/31] eta: 0:00:04 lr: 0.000013 loss: 0.0055 (0.0070) time: 0.2358 data: 0.1687 max mem: 5990 Epoch: [98] [20/31] eta: 0:00:01 lr: 0.000013 loss: 0.0055 (0.0066) time: 0.1015 data: 0.0367 max mem: 5990 Epoch: [98] [30/31] eta: 0:00:00 lr: 0.000013 loss: 0.0060 (0.0069) time: 0.0803 data: 0.0168 max mem: 5990 Epoch: [98] Total time: 0:00:04 (0.1493 s / it) Averaged stats: lr: 0.000013 loss: 0.0060 (0.0069) Epoch: [99] [ 0/31] eta: 0:00:52 lr: 0.000011 loss: 0.0060 (0.0060) time: 1.6896 data: 1.6204 max mem: 5990 Epoch: [99] [10/31] eta: 0:00:04 lr: 0.000011 loss: 0.0044 (0.0058) time: 0.2269 data: 0.1691 max mem: 5990 Epoch: [99] [20/31] eta: 0:00:01 lr: 0.000011 loss: 0.0044 (0.0079) time: 0.0842 data: 0.0276 max mem: 5990 Epoch: [99] [30/31] eta: 0:00:00 lr: 0.000011 loss: 0.0050 (0.0104) time: 0.0793 data: 0.0216 max mem: 5990 Epoch: [99] Total time: 0:00:04 (0.1459 s / it) Averaged stats: lr: 0.000011 loss: 0.0050 (0.0104) Test: [ 0/96] eta: 0:04:36 loss: 0.1926 (0.1926) acc1: 95.3125 (95.3125) acc5: 98.4375 (98.4375) time: 2.8771 data: 2.8161 max mem: 5990 Test: [10/96] eta: 0:00:37 loss: 0.4995 (0.4798) acc1: 85.9375 (87.9261) acc5: 96.8750 (96.4489) time: 0.4323 data: 0.3969 max mem: 5990 Test: [20/96] eta: 0:00:23 loss: 0.4949 (0.4559) acc1: 87.5000 (88.4673) acc5: 96.8750 (96.8750) time: 0.1854 data: 0.1537 max mem: 5990 Test: [30/96] eta: 0:00:17 loss: 0.4289 (0.4743) acc1: 87.5000 (88.2560) acc5: 96.8750 (96.9254) time: 0.1827 data: 0.1519 max mem: 5990 Test: [40/96] eta: 0:00:13 loss: 0.3898 (0.4464) acc1: 89.0625 (88.6052) acc5: 98.4375 (97.2942) time: 0.1736 data: 0.1426 max mem: 5990 Test: [50/96] eta: 0:00:10 loss: 0.3649 (0.4479) acc1: 90.6250 (88.5723) acc5: 96.8750 (97.2120) time: 0.1791 data: 0.1482 max mem: 5990 Test: [60/96] eta: 0:00:08 loss: 0.3942 (0.4422) acc1: 90.6250 (88.8320) acc5: 96.8750 (97.2592) time: 0.1810 data: 0.1505 max mem: 5990 Test: [70/96] eta: 0:00:05 loss: 0.3442 (0.4323) acc1: 90.6250 (89.1065) acc5: 96.8750 (97.2271) time: 0.2073 data: 0.1770 max mem: 5990 Test: [80/96] eta: 0:00:03 loss: 0.3416 (0.4361) acc1: 90.6250 (89.0239) acc5: 96.8750 (97.2029) time: 0.1988 data: 0.1685 max mem: 5990 Test: [90/96] eta: 0:00:01 loss: 0.4502 (0.4366) acc1: 90.6250 (89.1312) acc5: 96.8750 (97.1326) time: 0.1101 data: 0.0799 max mem: 5990 Test: [95/96] eta: 0:00:00 loss: 0.3636 (0.4248) acc1: 90.6250 (89.3180) acc5: 98.4375 (97.2062) time: 0.1093 data: 0.0799 max mem: 5990 Test: Total time: 0:00:18 (0.1971 s / it)
- Acc@1 89.318 Acc@5 97.206 loss 0.425 Accuracy of the network on the 6085 test images: 89.3% Max accuracy: 89.52% Training time 0:18:17
There are two things you can try to figure out the reason for unstable training performance:
- Try fixing the dataloader behavior:
from torch.utils.data import DataLoader
def worker_init_fn(worker_id):
np.random.seed(42 + worker_id)
random.seed(42 + worker_id)
train_loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4, worker_init_fn=worker_init_fn)
- Try training without
--pyra --separate_lr_for_pyra --pyra_lr=${pyra_lr}\first, this makes you train with only token merging. See if the unstability comes from the token merging process.
I have tried it, and experiments show that the performance instability comes from token merging.
Try fixing the Dataloader with plain token merging and see if it fixes.