Prototype references
I don't want to merge this PR. This more like a feature branch that we can discuss on. For the actual port we can either cleanup this PR or use it as a starting point for another.
I've added the needed changes for the detection references.
I've run the references for a few iterations with the following parameters to confirm they work:
-
Classification:
[ "--device=cpu", "--batch-size=2", "--epochs=1", "--workers=2", "--mixup-alpha=0.5", "--cutmix-alpha=0.5", "--auto-augment=ra", # "ra", "ta_wide", "augmix", "imagenet", "cifar10", "svhn" "--random-erase=1.0", ] -
Detection:
[ "--device=cpu", "--batch-size=2", "--epochs=1", "--workers=2", "--data-augmentation=hflip", # "hflip", "lsj", "multiscale", "ssd", "ssdlite" # "--use-copypaste", # if data_augmention == "lsj" ]
~Detection references are affected by #6528. Do not train before this is merged.~
PIL Backend
I'm doing the following runs to confirm the validity on PIL backend.
Classification
Augmentation: ta_wide + random erasing + mixup + cutmix
Target Acc: 80.854 / 95.428 - time: 1 day, 18:06:05 - jobid: experiments/PR5201/resnet50_sota2/13576
Using githash 4d73fe7:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 59857
Test: EMA Acc@1 80.668 Acc@5 95.258
Training time 1 day, 23:14:27
Result: The accuracy looks within the expected bounds, the training time looks increased.
Augmentation: aa + random erasing
Target Acc: 67.620 / 87.404 - time: 2 days, 17:09:20 - jobid: jobs/PR3354/classification/35753749
Using githash ec120ff:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 60686
Test: Acc@1 66.776 Acc@5 86.790
Training time 1 day, 19:07:42
Submitted job_id: 60977
Test: Acc@1 65.830 Acc@5 86.190
Training time 1 day, 17:33:59
Submitted job_id: 60978
Test: Acc@1 66.952 Acc@5 86.824
Training time 1 day, 18:18:25
Result: The accuracy looks less than expected. I looked at my notes and the specific model had high variance on the accuracy when we originally trained it (here are 3 acc@1 values from different runs: 66.240, 67.256, 67.620). So it's not impossible that this is OK but worth confirming. The training times are not comparable because the old models were trained on different hardware.
Detection
Augmentation: multiscale
Target Acc: 0.415 - time: 9:54:20 - jobid: experiments/PR5444/34702
Using githash 7cb08d5:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60097
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.414
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.616
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.438
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.258
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.454
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.535
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.338
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.545
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.588
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.419
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.728
Training time 9:54:21
Result: The accuracy looks as expected and so does the training time.
Augmentation: ssdlite
Target Acc: 0.212 - time: 1 day, 4:11:22 - jobid: jobs/PR3757/2nd_training/41046786
Using githash ec120ff:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60972
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.342
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.207
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.331
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.043
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
Training time 1 day, 16:39:09
Result: The accuracy looks as expected. The training time seems significantly increased despite using better hardware and faster IO to load the data. Definitely worth investigating.
Augmentation: ssd
Target Acc: 0.251 - time: 1 day, 3:40:14 - jobid: jobs/PR3403/4th_training/40773612
Using githash ec120ff:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60650
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.252
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.418
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.261
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.239
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.367
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.089
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
Training time 16:01:12
Result: The accuracy looks as expected. The training time can't be compared because we run on better hardware.
Augmentation: lsj + copypaste
Target Acc: 0.473 / 0.417 - time: 3 days, 19:00:54 - jobid: experiments/PR5825/22644
Using githash ec120ff:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60654
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.456
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.656
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.496
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.309
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.493
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.360
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.574
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.604
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.644
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.755
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.402
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.630
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.585
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.327
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.512
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.537
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.353
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.698
Training time 3 days, 14:05:05
Result: The accuracy is significantly reduced. It's worth checking the implementations. The training time seems reduced.
PIL Backend
Due to the bug discovered at #6541, we need to repeat the experiments that involve the LSJ.
Augmentation: lsj + copypaste
Target Acc: 0.473 / 0.417 - time: 3 days, 19:00:54 - jobid: experiments/PR5825/22644
Using githash 49e653f:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 61716
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.474
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.678
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.517
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.308
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.621
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.370
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.590
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.619
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.460
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.416
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.648
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.449
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.224
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.444
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.526
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.550
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.588
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.711
Training time 3 days, 14:04:04
Result: The accuracy looks as expected. The training time seems reduced.
To confirm there is no bug on AA, we will run the same tests on the current main branch and see if the results match.
Augmentation: aa + random erasing
Target Acc: 67.620 / 87.404 - time: 2 days, 17:09:20 - jobid: jobs/PR3354/classification/35753749
Using main:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 62122
Test: Acc@1 66.904 Acc@5 86.898
Training time 1 day, 19:30:46
Submitted job_id: 62123
Test: Acc@1 66.620 Acc@5 87.036
Training time 1 day, 17:31:04
Submitted job_id: 62124
Test: Acc@1 66.644 Acc@5 86.540
Training time 1 day, 17:47:35
Result: The accuracy on main branch is exactly as the one reported on v2 above. The execution time varies a lot due to ontap performance so it's hard to make comparisons.
@pmeier @vfdev-5 I think we just confirmed that the v2 produces the same model accuracy as v1. GGs! :) I don't think we can easily make arguments about speed because the jobs are affected a lot by the IO speed of OnTap. We use Victor's benchmarks instead.
PIL Backend
Segmentation
Target Acc: 91.2 / 57.9 - time: 3:13:11 - jobid: jobs/PR3276/3rd_training/35354083
Using githash a2893a1:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 64765
global correct: 87.9
average row correct: ['93.1', '41.7', '54.4', '52.6', '35.2', '13.2', '68.4', '38.3', '76.9', '5.8', '86.2', '56.9', '35.6', '54.5', '80.8', '83.1', '33.5', '30.6', '42.4', '65.7', '38.6']
IoU: ['86.9', '29.5', '51.4', '26.7', '25.0', '11.0', '62.8', '34.6', '45.2', '5.5', '47.0', '32.1', '26.6', '45.2', '66.6', '71.3', '20.0', '27.2', '33.2', '55.5', '34.8']
mean IoU: 39.9
Training time 4:12:37
Submitted job_id: 64766
global correct: 88.0
average row correct: ['93.4', '40.9', '59.2', '49.2', '33.6', '2.5', '70.5', '39.6', '76.7', '3.1', '82.3', '55.6', '45.6', '53.0', '79.9', '83.2', '41.9', '39.5', '32.1', '66.8', '38.2']
IoU: ['87.0', '24.6', '55.4', '29.5', '24.7', '2.3', '66.4', '36.4', '47.5', '3.0', '43.8', '32.5', '35.5', '40.8', '68.5', '71.2', '27.2', '33.5', '26.1', '58.1', '28.4']
mean IoU: 40.1
Training time 2:16:02
Submitted job_id: 64768
global correct: 87.5
average row correct: ['92.9', '41.0', '53.5', '50.8', '34.7', '6.5', '63.5', '38.3', '78.1', '5.4', '75.1', '55.3', '45.6', '52.1', '76.2', '82.3', '32.7', '40.4', '35.2', '69.4', '42.4']
IoU: ['86.5', '33.2', '50.2', '34.1', '24.6', '5.3', '59.9', '34.1', '44.7', '5.1', '46.1', '30.3', '32.4', '43.6', '64.1', '70.4', '22.5', '33.9', '27.8', '51.9', '37.3']
mean IoU: 39.9
Training time 2:17:09
Result: There seems to be an issue with the accuracy; we need to investigate. The speed looks faster mostly because of new hardware. Our cluster has quite big variance on execution times possibly due to IO bandwidth.
PIL Backend
Due to a discrepancy discovered on the padding of RandomCrop, we rerun the experiments:
Segmentation without pre-trained backbones
~~Target Acc: 91.2 / 57.9 - time: 3:13:11 - jobid: jobs/PR3276/3rd_training/35354083~~
Using githash 5a311b3:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 67061
global correct: 87.8
average row correct: ['93.3', '41.0', '47.7', '59.4', '34.9', '3.9', '67.9', '35.7', '74.4', '7.6', '86.4', '51.9', '41.7', '56.4', '80.9', '83.1', '31.8', '28.6', '38.3', '73.2', '36.9']
IoU: ['86.8', '24.6', '45.2', '35.5', '26.0', '3.4', '65.0', '31.9', '47.1', '7.2', '47.2', '29.8', '28.9', '47.4', '65.4', '71.2', '18.9', '25.1', '30.6', '60.5', '32.2']
mean IoU: 39.5
Training time 2:59:50
Submitted job_id: 67062
global correct: 87.7
average row correct: ['93.2', '36.0', '53.1', '48.2', '34.2', '3.8', '71.7', '38.3', '75.0', '5.4', '83.3', '54.9', '43.4', '57.6', '81.3', '82.1', '30.2', '23.3', '37.7', '65.0', '39.0']
IoU: ['86.7', '27.5', '50.4', '30.4', '24.0', '3.1', '65.7', '31.7', '46.7', '5.0', '52.3', '31.0', '28.2', '44.9', '68.9', '69.9', '21.0', '21.5', '29.7', '57.4', '36.7']
mean IoU: 39.7
Training time 3:06:06
Submitted job_id: 67063
global correct: 87.7
average row correct: ['93.1', '47.3', '47.4', '56.6', '28.8', '9.4', '76.5', '34.7', '76.6', '5.6', '83.5', '48.8', '38.7', '61.9', '79.5', '83.3', '36.7', '35.5', '39.7', '71.7', '35.9']
IoU: ['86.5', '26.2', '44.5', '39.5', '25.2', '8.3', '68.8', '31.8', '45.9', '5.1', '52.0', '28.0', '28.7', '52.9', '69.6', '71.0', '21.2', '31.5', '29.6', '58.0', '31.9']
mean IoU: 40.8
Training time 2:21:36
Running the same experiment using latest main 56e707bfccb62ada836d21e431d6db0d10dd73a1:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 67099
global correct: 87.8
average row correct: ['93.3', '43.3', '59.2', '48.7', '34.1', '2.1', '72.4', '38.6', '68.5', '8.2', '84.9', '52.7', '45.8', '50.9', '81.3', '83.1', '40.6', '20.7', '36.4', '68.4', '37.4']
IoU: ['86.8', '36.2', '54.4', '27.1', '25.2', '1.8', '65.3', '34.4', '41.4', '7.8', '44.8', '30.1', '28.8', '44.7', '69.6', '72.0', '27.4', '18.2', '29.2', '57.4', '30.5']
mean IoU: 39.7
Training time 2:11:46
Submitted job_id: 67100
global correct: 87.9
average row correct: ['93.4', '41.9', '50.3', '54.4', '31.0', '7.4', '70.3', '34.0', '72.6', '6.1', '74.2', '54.6', '48.8', '55.7', '80.9', '83.0', '31.6', '33.0', '30.8', '67.0', '34.0']
IoU: ['86.8', '32.2', '47.6', '33.0', '26.4', '6.4', '65.4', '31.8', '48.6', '5.7', '40.3', '31.3', '29.4', '45.1', '69.3', '71.0', '21.4', '30.5', '24.7', '56.0', '31.3']
mean IoU: 39.7
Training time 2:17:05
Submitted job_id: 67101
global correct: 87.7
average row correct: ['93.0', '37.0', '47.2', '54.4', '32.1', '2.3', '67.1', '38.3', '77.9', '11.3', '85.0', '56.3', '38.1', '52.8', '77.5', '83.1', '32.3', '32.3', '40.5', '72.0', '35.0']
IoU: ['86.7', '31.8', '44.8', '36.2', '23.0', '1.8', '63.1', '34.7', '51.9', '10.5', '41.0', '31.5', '24.7', '40.5', '67.1', '70.6', '24.1', '27.1', '30.7', '58.0', '31.1']
mean IoU: 39.6
Training time 2:16:45
Result: As we can see both main and the prototype produce identical results. The reason the accuracy is reduced is because we don't use pre-trained backbones.
Segmentation with pre-trained backbones
Using githash 5a311b3:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67310
global correct: 90.4
average row correct: ['94.1', '68.9', '63.6', '79.0', '36.9', '29.5', '81.9', '57.5', '82.8', '30.7', '90.0', '57.6', '79.1', '80.8', '86.9', '88.4', '36.7', '67.7', '57.2', '80.6', '59.7']
IoU: ['89.2', '53.1', '59.3', '68.7', '30.9', '24.3', '76.8', '49.7', '71.8', '25.5', '59.7', '36.1', '59.3', '58.7', '72.6', '78.7', '25.6', '57.4', '38.1', '54.0', '51.7']
mean IoU: 54.3
Training time 2:19:32
Submitted job_id: 67311
global correct: 90.7
average row correct: ['94.4', '70.9', '55.5', '80.1', '37.6', '26.7', '85.8', '58.1', '83.2', '28.1', '89.5', '58.7', '77.0', '80.7', '86.8', '88.1', '29.3', '71.3', '62.6', '81.3', '63.2']
IoU: ['89.5', '54.1', '52.4', '72.9', '32.5', '22.7', '79.8', '50.5', '69.2', '23.6', '61.7', '37.8', '57.2', '58.8', '72.9', '78.8', '21.2', '58.5', '40.5', '56.5', '54.2']
mean IoU: 54.5
Training time 2:39:48
Submitted job_id: 67312
global correct: 90.7
average row correct: ['94.3', '72.0', '59.0', '78.3', '37.3', '28.6', '85.3', '57.8', '83.5', '28.8', '90.2', '60.7', '79.3', '76.9', '85.2', '87.8', '38.0', '72.2', '70.9', '79.8', '64.2']
IoU: ['89.5', '54.0', '56.0', '69.3', '30.8', '24.0', '78.0', '50.5', '67.9', '24.1', '60.3', '38.5', '64.6', '56.7', '72.6', '78.1', '27.4', '60.8', '47.6', '55.2', '60.1']
mean IoU: 55.5
Training time 2:25:39
Running the same experiment using latest main 56e707bfccb62ada836d21e431d6db0d10dd73a1:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67340
global correct: 90.2
average row correct: ['93.8', '73.1', '61.8', '77.1', '37.2', '27.4', '84.0', '60.1', '85.0', '28.9', '90.2', '60.9', '77.5', '75.0', '86.8', '88.0', '33.4', '68.1', '56.6', '80.1', '61.2']
IoU: ['88.9', '54.5', '58.3', '70.3', '32.3', '23.3', '78.6', '52.0', '72.0', '24.0', '59.0', '35.9', '59.0', '56.7', '73.0', '78.5', '24.4', '55.8', '37.3', '56.3', '56.4']
mean IoU: 54.6
Training time 2:17:01
Submitted job_id: 67341
global correct: 90.4
average row correct: ['93.9', '67.0', '61.8', '77.8', '48.7', '25.4', '85.6', '59.1', '84.8', '31.0', '90.3', '60.5', '79.7', '80.0', '86.8', '88.7', '38.9', '66.0', '55.2', '81.3', '63.4']
IoU: ['89.2', '50.5', '58.2', '71.1', '41.6', '22.0', '79.4', '51.0', '70.4', '25.1', '58.3', '36.9', '63.0', '59.8', '71.2', '78.8', '27.6', '55.0', '36.6', '54.0', '55.2']
mean IoU: 55.0
Training time 2:14:00
Submitted job_id: 67342
global correct: 90.4
average row correct: ['94.0', '70.4', '67.0', '76.9', '38.2', '22.3', '86.6', '59.8', '83.8', '29.2', '90.2', '58.7', '79.2', '80.1', '83.8', '88.6', '31.3', '69.7', '55.2', '79.1', '65.0']
IoU: ['89.1', '51.9', '62.2', '71.4', '31.8', '19.7', '82.5', '51.4', '69.8', '24.8', '61.0', '35.5', '59.6', '59.2', '72.0', '78.8', '22.0', '58.9', '38.3', '55.4', '58.4']
mean IoU: 54.9
Training time 2:14:29
Result: As we can see the accuracies are comparable. The speed is slightly reduced but in these experiments we are heavily affected by IO bandwidth as the cluster is not isolated. We should follow up with isolated benchmarks to measure accurately. We also observe that the end accuracy in both branch is lower than the historical value. Since the issue is observed also in main this is not related to Transforms V2 but rather to a regression we will investigate separately.
We therefore conclude that all transforms provide equivalent accuracies across Image Classification, Detection and Segmentation. The speed seems slightly worse on V2, which is expected. We will now work on improving the performance of the Transforms and on adding Video support.
Tensor Backend + antialias=True
Compared against PIL backend ran previously [1, 2, 3, 4].
For the Video support, check https://github.com/pytorch/vision/pull/6433#issuecomment-1274585805.
Classification
Augmentation: ta_wide + random erasing + mixup + cutmix
Target Acc: 80.668 / 95.258 - time: 1 day, 23:14:27 - jobid: experiments/PR6433/59857
Using githash 6ef4d82:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 68029
Test: EMA Acc@1 80.626 Acc@5 95.310
Training time 2 days, 6:04:18
Result: Comparable accuracy between backends. The speed seems reduced significantly, we need to investigate more on the benchmarks.
Augmentation: aa + random erasing
Target Acc: 66.904 / 86.898 - time: 1 day, 19:30:46 - jobid: experiments/PR6433/62122
Using githash 6ef4d82:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 68030
Test: Acc@1 66.044 Acc@5 86.338
Training time 2 days, 4:55:57
Result: The accuracy is lower but is within the bands of randomness (see previous runs) above, so no concerns over accuracy. The speed seems reduced significantly, we need to investigate more on the benchmarks.
Detection
Augmentation: multiscale
Target Acc: 0.414 - time: 9:54:21 - jobid: experiments/PR6433/60097
Using githash e9c480e:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67794
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.415
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.618
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.440
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.266
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.453
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.541
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.338
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.546
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.588
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.419
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
Training time 9:49:04
Result: Comparable accuracy, comparable speed. The two backends work the same.
Augmentation: ssdlite
Target Acc: 0.210 - time: 1 day, 16:39:09 - jobid: experiments/PR6433/60972
Using githash e9c480e:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67795
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.342
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.222
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.446
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.207
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.332
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.044
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.339
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.648
Training time 1 day, 16:06:26
Result: Comparable accuracy, comparable speed. The two backends work the same.
Augmentation: ssd
Target Acc: 0.252 - time: 16:01:12 - jobid: experiments/PR6433/60650
Using githash e9c480e:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67796
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.252
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.417
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.262
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.057
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.239
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.092
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599
Training time 17:28:42
Result: Comparable accuracy across backends. The speed looks lower but the training is heavily affected by IO bandwidth on the cluster so we need to do further checks to confirm.
Augmentation: lsj + copypaste
Target Acc: 0.474 / 0.416 - time: 3 days, 14:04:04 - jobid: experiments/PR6433/61716
Using githash e9c480e:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67791
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.474
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.680
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.518
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.304
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.370
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.589
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.618
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.444
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.654
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.770
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.416
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.650
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.446
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.221
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.442
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.608
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.337
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.525
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.549
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.362
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.586
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.714
Training time 3 days, 15:09:55
Result: Comparable accuracy, comparable speed. The two backends work the same.
Segmentation
Target Acc: 90.7 / 55.5 - time: 2:25:39 - jobid: experiments/PR6433/67312
Using githash e9c480e:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67797
global correct: 90.5
average row correct: ['94.0', '72.6', '63.3', '78.6', '37.8', '25.2', '86.0', '58.7', '83.8', '29.4', '90.6', '61.8', '77.3', '77.8', '85.4', '87.9', '39.2', '71.7', '57.7', '80.1', '62.9']
IoU: ['89.3', '53.4', '58.6', '71.0', '29.6', '21.8', '80.8', '51.0', '69.5', '24.0', '60.8', '38.1', '57.7', '58.6', '71.5', '78.5', '27.2', '61.1', '36.6', '56.3', '54.2']
mean IoU: 54.7
Training time 2:20:45
Result: Comparable accuracy, comparable speed. The two backends work the same.
We therefore conclude that all transforms provide equivalent accuracies across Image Classification, Detection and Segmentation across the 2 backends. The speed seems worse in the Tensor backend in some cases but the work that @vfdev-5 is currently doing on improving the performance should help close the gap.
Video
Original recipe
Target Acc: 68.368 / 88.050 - time: 2 days, 13:07:05 - jobid: experiments/PR6412/59255
Using githash 669b1ba:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --train-resize-size 256 256 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
Submitted job_id: 71369
Training time 2 days, 12:12:11
Validated using 128 clip lengths as the original model:
trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume ~/experiments/PR6433/71369/model_44.pth
* Clip Acc@1 68.333 Clip Acc@5 87.980
Result: Comparable accuracy, comparable speed. This confirms that our transforms work as expected for Video.
New Recipe
Using githash d5f1532:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --auto-augment ta_wide --mixup-alpha 0.8 --cutmix-alpha 1.0 --random-erase 0.25 --train-resize-size 256 320 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
Submitted job_id: 72508
Training time 3 days, 6:00:05
Validated using 128 clip lengths as the original model:
trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume ~/experiments/PR6433/72508/model_42.pth
* Clip Acc@1 70.903 Clip Acc@5 90.434
Result: Improved accuracy with the new recipe. The Augmentations works as expected for Video.
Tensor Backend + antialias=True
Reverifying the new API after the speed optimizations. Reference runs at https://github.com/pytorch/vision/pull/6433#issuecomment-1256741233 and https://github.com/pytorch/vision/pull/6433#issuecomment-1274585805
Classification
Augmentation: ta_wide + random erasing + mixup + cutmix
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 80.626 / 95.310 - time: 2 days, 6:04:18 - jobid: experiments/PR6433/68029
Submitted job_id: 75703
Test: EMA Acc@1 80.862 Acc@5 95.476
Training time 2 days, 0:12:39
Result: Similar accuracy, 11% faster than unoptimized V2.
Augmentation: aa + random erasing
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 66.044 / 86.338 - time: 2 days, 4:55:57 - jobid: experiments/PR6433/68030
Submitted job_id: 75704
Test: Acc@1 67.146 Acc@5 87.086
Training time 1 day, 22:05:18
Result: Similar accuracy (improvement not statistically significant), 13% faster than unoptimized V2.
Detection
Augmentation: multiscale
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.415 - time: 9:49:04 - jobid: experiments/PR6433/67794
Submitted job_id: 75705
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.413
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.615
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.437
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.273
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.456
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.536
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.337
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.544
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.585
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.625
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719
Training time 9:46:33
Result: Similar accuracy and speed.
Augmentation: ssdlite
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.212 - time: 1 day, 16:06:26 - jobid: experiments/PR6433/67795
Submitted job_id: 75706
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.341
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.218
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.207
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.330
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.338
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
Training time 1 day, 14:44:39
Result: Similar accuracy, 8% faster than unoptimized V2.
Augmentation: ssd
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.252 - time: 17:28:42 - jobid: experiments/PR6433/67796
Submitted job_id: 75707
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.254
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.421
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.264
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.238
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.346
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.367
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599
Training time 16:43:37
Result: Similar accuracy, 4% faster than unoptimized V2.
Augmentation: lsj + copypaste
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.474 / 0.416 - time: 3 days, 15:09:55 - jobid: experiments/PR6433/67791
Submitted job_id: 75998
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.480
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.682
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.526
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.592
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.621
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.447
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.419
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.655
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.452
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.447
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.526
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.550
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.589
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.709
Training time 3 days, 14:07:56
Result: Similar accuracy and speed.
Segmentation
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
# V2 Target Acc: 90.5 / 54.7 - time: 2:20:45 - jobid: experiments/PR6433/67797
Submitted job_id: 75997
global correct: 90.7
average row correct: ['94.8', '72.7', '63.3', '79.2', '43.5', '28.8', '84.4', '60.7', '84.2', '28.4', '90.5', '52.2', '80.7', '74.2', '85.6', '88.2', '33.3', '67.6', '55.9', '77.9', '59.0']
IoU: ['89.6', '52.9', '59.2', '69.2', '38.5', '24.2', '79.2', '52.2', '71.6', '23.2', '56.8', '35.8', '62.3', '56.3', '72.0', '78.3', '23.1', '57.7', '37.3', '54.6', '54.7']
mean IoU: 54.7
Training time 2:20:49
Result: Similar accuracy and speed.
Video
New Recipe
Using githash 959af2d:
PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --auto-augment ta_wide --mixup-alpha 0.8 --cutmix-alpha 1.0 --random-erase 0.25 --train-resize-size 256 320 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
# V2 Target Acc: 70.903 / 90.434 - time: 3 days, 6:00:05 - jobid: experiments/PR6433/72508
Submitted job_id: 75701
Training time 3 days, 3:42:15
trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume experiments/PR6433/75701/model_40.pth
* Clip Acc@1 71.134 Clip Acc@5 90.486
Result: Similar accuracy, the speed is faster. It's a bit harder to estimate improvement because the logs indicate IO slowdown caused by OnTap during at least 3 epochs. The new version is consistently 6-7 minutes per epoch faster than the old, which translates to roughly 7-8% improvement.