permute-quantize-finetune icon indicating copy to clipboard operation
permute-quantize-finetune copied to clipboard

The precision of ResNet18 is .0?

Open talenz opened this issue 3 years ago • 12 comments

The command I ran is "python -m src.train_resnet --config ../config/train_resnet18.yaml", I got the accuracy is 0.0 after finetune! Any idea of what's causing it?

Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167) Validation Epoch #9: 100%|████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)] Validation Epoch #9: loss (7.09), accuracy (0.08) Done training!

talenz avatar Jun 08 '21 01:06 talenz

Huh. Can you please share the entire log? Is this with a single GPU?

una-dinosauria avatar Jun 08 '21 01:06 una-dinosauria

Hey, I was using a single GPU! where can I find the entire log?

talenz avatar Jun 08 '21 01:06 talenz

INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0 INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10 INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0 INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12 INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0 INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13 INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents INFO:[2021/06/07 17:24:26] Optimizing permutation for fc INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10

talenz avatar Jun 08 '21 01:06 talenz

Yep that looks like the log, but definitely not all of it.

una-dinosauria avatar Jun 08 '21 02:06 una-dinosauria

INFO:[2021/06/07 17:23:58] { "dataloader": { "batch_size": 128, "imagenet_path": "imagenet", "num_workers": 20, "train_shuffle": true, "validation_shuffle": false }, "epochs": 9, "learning_rate": 0.001, "lr_scheduler": { "min_lr": 1e-06, "type": "cosine" }, "model": { "arch": "resnet18", "compression_parameters": { "fc_subvector_size": 4, "ignored_modules": [ "conv1" ], "k": 256, "k_means_n_iters": 10, "k_means_type": "src", "large_subvectors": false, "layer_specs": { "fc": { "k": 2048, "k_means_type": "src" } }, "pw_subvector_size": 4 }, "permutations": [ [ { "parents": [ "layer1.0.conv1", "layer1.0.bn1" ] }, { "children": [ "layer1.0.conv2" ] } ], [ { "parents": [ "layer1.1.conv1", "layer1.1.bn1" ] }, { "children": [ "layer1.1.conv2" ] } ], [ { "parents": [ "layer2.0.conv1", "layer2.0.bn1" ] }, { "children": [ "layer2.0.conv2" ] } ], [ { "parents": [ "layer2.1.conv1", "layer2.1.bn1" ] }, { "children": [ "layer2.1.conv2" ] } ], [ { "parents": [ "layer3.0.conv1", "layer3.0.bn1" ] }, { "children": [ "layer3.0.conv2" ] } ], [ { "parents": [ "layer3.1.conv1", "layer3.1.bn1" ] }, { "children": [ "layer3.1.conv2" ] } ], [ { "parents": [ "layer4.0.conv1", "layer4.0.bn1" ] }, { "children": [ "layer4.0.conv2" ] } ], [ { "parents": [ "layer4.1.conv1", "layer4.1.bn1" ] }, { "children": [ "layer4.1.conv2" ] } ], [ { "parents": [ "conv1", "bn1", "layer1.0.conv2", "layer1.0.bn2", "layer1.1.conv2", "layer1.1.bn2" ] }, { "children": [ "layer1.0.conv1", "layer1.1.conv1", "layer2.0.conv1", "layer2.0.downsample.0" ] } ], [ { "parents": [ "layer2.0.downsample.0", "layer2.0.downsample.1", "layer2.0.conv2", "layer2.0.bn2", "layer2.1.conv2", "layer2.1.bn2" ] }, { "children": [ "layer2.1.conv1", "layer3.0.conv1", "layer3.0.downsample.0" ] } ], [ { "parents": [ "layer3.0.downsample.0", "layer3.0.downsample.1", "layer3.0.conv2", "layer3.0.bn2", "layer3.1.conv2", "layer3.1.bn2" ] }, { "children": [ "layer3.1.conv1", "layer4.0.conv1", "layer4.0.downsample.0" ] } ], [ { "parents": [ "layer4.0.downsample.0", "layer4.0.downsample.1", "layer4.0.conv2", "layer4.0.bn2", "layer4.1.conv2", "layer4.1.bn2" ] }, { "children": [ "layer4.1.conv1", "fc" ] } ] ], "sls_iterations": 10000, "use_permutations": true }, "momentum": 0.9, "optimizer": "adam", "output_path": "<your_output_path_here>", "skip_initial_validation": false, "weight_decay": 0.0001 } INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer2.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer3.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.0.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer4.1.conv2']) with 2 parents INFO:[2021/06/07 17:24:01] None of the layers are optimizable. Skipping. INFO:[2021/06/07 17:24:01] Optimizing permutation for odict_keys(['layer1.0.conv1', 'layer1.1.conv1', 'layer2.0.conv1', 'layer2.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:01] Optimizing permutation for layer2.0.downsample.0 INFO:[2021/06/07 17:24:01] Greedy: 5.529302e-10 -> 2.744337e-10. Done in 0.01 seconds layer2.0.downsample.0 2.301621e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:02<00:00, 4359.13it/s]INFO:[2021/06/07 17:24:04] SLS : 2.744337e-10 -> 2.301621e-10. Done in 2.30 seconds INFO:[2021/06/07 17:24:04] layer2.0.downsample.0: prev covdet 5.529302e-10, new covdet: 2.301621e-10 INFO:[2021/06/07 17:24:04] Optimizing permutation for odict_keys(['layer2.1.conv1', 'layer3.0.conv1', 'layer3.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:04] Optimizing permutation for layer3.0.downsample.0 INFO:[2021/06/07 17:24:04] Greedy: 1.373051e-12 -> 1.138951e-12. Done in 0.02 seconds layer3.0.downsample.0 1.078800e-12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:05<00:00, 1763.26it/s]INFO:[2021/06/07 17:24:09] SLS : 1.138951e-12 -> 1.078800e-12. Done in 5.67 seconds INFO:[2021/06/07 17:24:09] layer3.0.downsample.0: prev covdet 1.373051e-12, new covdet: 1.078800e-12 INFO:[2021/06/07 17:24:09] Optimizing permutation for odict_keys(['layer3.1.conv1', 'layer4.0.conv1', 'layer4.0.downsample.0']) with 6 parents INFO:[2021/06/07 17:24:09] Optimizing permutation for layer4.0.downsample.0 INFO:[2021/06/07 17:24:10] Greedy: 1.327052e-12 -> 7.464063e-13. Done in 0.04 seconds layer4.0.downsample.0 7.169530e-13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:16<00:00, 598.93it/s]INFO:[2021/06/07 17:24:26] SLS : 7.464063e-13 -> 7.169530e-13. Done in 16.70 seconds INFO:[2021/06/07 17:24:26] layer4.0.downsample.0: prev covdet 1.327052e-12, new covdet: 7.169530e-13 INFO:[2021/06/07 17:24:26] Optimizing permutation for odict_keys(['layer4.1.conv1', 'fc']) with 6 parents INFO:[2021/06/07 17:24:26] Optimizing permutation for fc INFO:[2021/06/07 17:24:26] Greedy: 5.416843e-10 -> 5.294320e-10. Done in 0.08 seconds fc 5.158992e-10: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:14<00:00, 134.48it/s]INFO:[2021/06/07 17:25:41] SLS : 5.294320e-10 -> 5.158992e-10. Done in 74.37 seconds INFO:[2021/06/07 17:25:41] fc: prev covdet 5.416843e-10, new covdet: 5.158992e-10 INFO:[2021/06/07 17:25:41] layer1.0.conv1 compression: 10; mse: 4.434068e-03; codebook size: 256 x 9; code size: 64 x 64 INFO:[2021/06/07 17:25:41] layer1.0.conv2 compression: 10; mse: 3.580092e-03; codebook size: 256 x 9; code size: 64 x 64 INFO:[2021/06/07 17:25:41] layer1.1.conv1 compression: 10; mse: 4.652465e-03; codebook size: 256 x 9; code size: 64 x 64 INFO:[2021/06/07 17:25:41] layer1.1.conv2 compression: 10; mse: 3.743610e-03; codebook size: 256 x 9; code size: 64 x 64 INFO:[2021/06/07 17:25:41] layer2.0.conv1 compression: 10; mse: 3.415691e-03; codebook size: 256 x 9; code size: 128 x 64 INFO:[2021/06/07 17:25:41] layer2.0.conv2 compression: 10; mse: 2.482994e-03; codebook size: 256 x 9; code size: 128 x 128 INFO:[2021/06/07 17:25:41] layer2.0.downsample.0 compression: 10; mse: 1.262195e-03; codebook size: 256 x 4; code size: 128 x 16 INFO:[2021/06/07 17:25:41] layer2.1.conv1 compression: 10; mse: 2.888829e-03; codebook size: 256 x 9; code size: 128 x 128 INFO:[2021/06/07 17:25:41] layer2.1.conv2 compression: 10; mse: 2.073620e-03; codebook size: 256 x 9; code size: 128 x 128 INFO:[2021/06/07 17:25:41] layer3.0.conv1 compression: 10; mse: 1.135532e-03; codebook size: 256 x 9; code size: 256 x 128 INFO:[2021/06/07 17:25:41] layer3.0.conv2 compression: 10; mse: 1.265556e-03; codebook size: 256 x 9; code size: 256 x 256 INFO:[2021/06/07 17:25:41] layer3.0.downsample.0 compression: 10; mse: 3.959292e-04; codebook size: 256 x 4; code size: 256 x 32 INFO:[2021/06/07 17:25:41] layer3.1.conv1 compression: 10; mse: 1.148529e-03; codebook size: 256 x 9; code size: 256 x 256 INFO:[2021/06/07 17:25:41] layer3.1.conv2 compression: 10; mse: 8.906376e-04; codebook size: 256 x 9; code size: 256 x 256 INFO:[2021/06/07 17:25:41] layer4.0.conv1 compression: 10; mse: 6.173717e-04; codebook size: 256 x 9; code size: 512 x 256 INFO:[2021/06/07 17:25:41] layer4.0.conv2 compression: 10; mse: 6.496188e-04; codebook size: 256 x 9; code size: 512 x 512 INFO:[2021/06/07 17:25:41] layer4.0.downsample.0 compression: 10; mse: 3.887021e-04; codebook size: 256 x 4; code size: 512 x 64 INFO:[2021/06/07 17:25:41] layer4.1.conv1 compression: 10; mse: 4.462329e-04; codebook size: 256 x 9; code size: 512 x 512 INFO:[2021/06/07 17:25:41] layer4.1.conv2 compression: 10; mse: 6.987687e-05; codebook size: 256 x 9; code size: 512 x 512 INFO:[2021/06/07 17:25:44] fc compression: 10; mse: 6.214550e-04; codebook size: 2048 x 4; code size: 1000 x 128 INFO:[2021/06/07 17:25:44] uncompressed (bits): 374064384 compressed (bits): 12927232 uncompressed (MB): 44.59 compressed (MB): 1.54 compression ratio: 28.94 Validation Epoch #0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.75it/s, loss=6.29, accuracy=5.57% (2784/50000)]Validation Epoch #0: loss (6.29), accuracy (5.57) Training Epoch #1: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:05<04:49, 7.68it/s, loss=7.2, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:40<00:00, 8.07it/s, loss=9.66, accuracy=0]Training Epoch #1: loss: 7.78, accuracy: 0.07% (937/1281167) Validation Epoch #1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:43<00:00, 8.91it/s, loss=8.83, accuracy=0.08% (39/50000)]Validation Epoch #1: loss (8.83), accuracy (0.08) Training Epoch #2: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7794/10010 [16:26<04:52, 7.57it/s, loss=6.09, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #2: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [21:06<00:00, 7.90it/s, loss=8.14, accuracy=0]Training Epoch #2: loss: 6.99, accuracy: 0.00% (2/1281167) Validation Epoch #2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:39<00:00, 9.83it/s, loss=8.17, accuracy=0.12% (58/50000)]Validation Epoch #2: loss (8.17), accuracy (0.12) Training Epoch #3: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:25, 8.36it/s, loss=7.01, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:35<00:00, 8.10it/s, loss=8.94, accuracy=0]Training Epoch #3: loss: 6.59, accuracy: 0.01% (138/1281167) Validation Epoch #3: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.10it/s, loss=9.10, accuracy=0.08% (40/50000)]Validation Epoch #3: loss (9.10), accuracy (0.08) Training Epoch #4: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████ | 7790/10010 [16:09<04:28, 8.27it/s, loss=6.7, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #4: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:44<00:00, 8.04it/s, loss=6.46, accuracy=0]Training Epoch #4: loss: 6.44, accuracy: 0.02% (262/1281167) Validation Epoch #4: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.77it/s, loss=9.12, accuracy=0.10% (49/50000)]Validation Epoch #4: loss (9.12), accuracy (0.10) Training Epoch #5: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:29, 8.24it/s, loss=6.06, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.62, accuracy=0]Training Epoch #5: loss: 6.43, accuracy: 0.02% (239/1281167) Validation Epoch #5: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:47<00:00, 8.29it/s, loss=9.33, accuracy=0.10% (52/50000)]Validation Epoch #5: loss (9.33), accuracy (0.10) Training Epoch #6: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:04<04:20, 8.53it/s, loss=6.73, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #6: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=12.6, accuracy=0]Training Epoch #6: loss: 6.22, accuracy: 0.09% (1117/1281167) Validation Epoch #6: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:35<00:00, 11.08it/s, loss=9.18, accuracy=0.09% (46/50000)]Validation Epoch #6: loss (9.18), accuracy (0.09) Training Epoch #7: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:02<04:41, 7.88it/s, loss=7.38, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #7: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:36<00:00, 8.10it/s, loss=6.59, accuracy=0]Training Epoch #7: loss: 6.33, accuracy: 0.06% (776/1281167) Validation Epoch #7: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:42<00:00, 9.20it/s, loss=8.28, accuracy=0.10% (50/50000)]Validation Epoch #7: loss (8.28), accuracy (0.10) Training Epoch #8: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:07<04:30, 8.21it/s, loss=7.35, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #8: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:43<00:00, 8.05it/s, loss=5.31, accuracy=0]Training Epoch #8: loss: 6.73, accuracy: 0.06% (753/1281167) Validation Epoch #8: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:34<00:00, 11.38it/s, loss=8.29, accuracy=0.10% (52/50000)]Validation Epoch #8: loss (8.29), accuracy (0.10) Training Epoch #9: 78%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7790/10010 [16:12<04:31, 8.17it/s, loss=7.36, accuracy=0]/usr/local/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:793: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) Training Epoch #9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10010/10010 [20:46<00:00, 8.03it/s, loss=6.63, accuracy=0]Training Epoch #9: loss: 7.25, accuracy: 0.02% (304/1281167) Validation Epoch #9: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 391/391 [00:36<00:00, 10.68it/s, loss=7.09, accuracy=0.08% (41/50000)]Validation Epoch #9: loss (7.09), accuracy (0.08) Done training!

This should be the whole log info

talenz avatar Jun 08 '21 02:06 talenz

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

una-dinosauria avatar Jun 08 '21 03:06 una-dinosauria

It seems like the accuracy after the initial compression is too low (5%). I'll try to reproduce on my end, thanks for bringing this up!

Thanks! Waiting for your info~

talenz avatar Jun 08 '21 03:06 talenz

I also have similar situation, the accuracy was maintained at 5% and has not changed!

lewin4 avatar Aug 03 '21 03:08 lewin4

Are you seeing this with other models too? Or just with Resnet 18? Could any of you provide a docker image to reproduce your error? (I should have provided one to reproduce our experiments, sorry about that)

una-dinosauria avatar Aug 03 '21 23:08 una-dinosauria

Hello @talenz and @lewin4,

I have managed to reproduce the issue that you are reporting.

I apologize. Since we developed this code on machines with distributed training with horovod, we missed a bug in the dataloader. As written, the training imagenet dataloader is not shuffling the training data.

You can replace

https://github.com/uber-research/permute-quantize-finetune/blob/53a30bade862769f4f0523e38c4ee1333a2e48b1/src/dataloading/imagenet_loader.py#L70

with

loader = DataLoader(
        dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        shuffle=(sampler is None),
        sampler=sampler,
        pin_memory=True
    )

and that should bring back training numbers that make sense.

Also, please note that I don't have write access to this repo anymore, so I will push patches to my personal fork at https://github.com/una-dinosauria/permute-quantize-finetune/. I'll let you know here when that repo is patched.

Again, sorry for this mistake, and thank you so much for reporting this issue.

Cheers,

una-dinosauria avatar Sep 07 '21 00:09 una-dinosauria

Fixed by https://github.com/una-dinosauria/permute-quantize-finetune/pull/2.

Cheers,

una-dinosauria avatar Sep 07 '21 00:09 una-dinosauria

I've also added a docker image to make it easier to reproduce our results: https://github.com/una-dinosauria/permute-quantize-finetune/pull/3

una-dinosauria avatar Sep 07 '21 01:09 una-dinosauria