vision icon indicating copy to clipboard operation
vision copied to clipboard

Prototype references

Open pmeier opened this issue 3 years ago • 8 comments

I don't want to merge this PR. This more like a feature branch that we can discuss on. For the actual port we can either cleanup this PR or use it as a starting point for another.

pmeier avatar Aug 17 '22 08:08 pmeier

I've added the needed changes for the detection references.

pmeier avatar Aug 24 '22 13:08 pmeier

I've run the references for a few iterations with the following parameters to confirm they work:

  • Classification:

    [
        "--device=cpu",
        "--batch-size=2",
        "--epochs=1",
        "--workers=2",
        "--mixup-alpha=0.5",
        "--cutmix-alpha=0.5",
        "--auto-augment=ra",  # "ra", "ta_wide", "augmix", "imagenet", "cifar10", "svhn"
        "--random-erase=1.0",
    ]
    
  • Detection:

    [
        "--device=cpu",
        "--batch-size=2",
        "--epochs=1",
        "--workers=2",
        "--data-augmentation=hflip",  # "hflip", "lsj", "multiscale", "ssd", "ssdlite"
        # "--use-copypaste",  # if data_augmention == "lsj"
    ]
    

pmeier avatar Sep 01 '22 08:09 pmeier

~Detection references are affected by #6528. Do not train before this is merged.~

pmeier avatar Sep 01 '22 08:09 pmeier

PIL Backend

I'm doing the following runs to confirm the validity on PIL backend.

Classification

Augmentation: ta_wide + random erasing + mixup + cutmix

Target Acc: 80.854 / 95.428 - time: 1 day, 18:06:05 - jobid: experiments/PR5201/resnet50_sota2/13576

Using githash 4d73fe7:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 59857

Test: EMA Acc@1 80.668 Acc@5 95.258
Training time 1 day, 23:14:27

Result: The accuracy looks within the expected bounds, the training time looks increased.

Augmentation: aa + random erasing

Target Acc: 67.620 / 87.404 - time: 2 days, 17:09:20 - jobid: jobs/PR3354/classification/35753749

Using githash ec120ff:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 60686
Test:  Acc@1 66.776 Acc@5 86.790
Training time 1 day, 19:07:42

Submitted job_id: 60977
Test:  Acc@1 65.830 Acc@5 86.190
Training time 1 day, 17:33:59

Submitted job_id: 60978
Test:  Acc@1 66.952 Acc@5 86.824
Training time 1 day, 18:18:25

Result: The accuracy looks less than expected. I looked at my notes and the specific model had high variance on the accuracy when we originally trained it (here are 3 acc@1 values from different runs: 66.240, 67.256, 67.620). So it's not impossible that this is OK but worth confirming. The training times are not comparable because the old models were trained on different hardware.

Detection

Augmentation: multiscale

Target Acc: 0.415 - time: 9:54:20 - jobid: experiments/PR5444/34702

Using githash 7cb08d5:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60097

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.414
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.616
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.438
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.258
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.454
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.338
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.545
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.588
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.419
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.728
Training time 9:54:21

Result: The accuracy looks as expected and so does the training time.

Augmentation: ssdlite

Target Acc: 0.212 - time: 1 day, 4:11:22 - jobid: jobs/PR3757/2nd_training/41046786

Using githash ec120ff:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60972

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.217
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.331
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.043
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.341
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
Training time 1 day, 16:39:09

Result: The accuracy looks as expected. The training time seems significantly increased despite using better hardware and faster IO to load the data. Definitely worth investigating.

Augmentation: ssd

Target Acc: 0.251 - time: 1 day, 3:40:14 - jobid: jobs/PR3403/4th_training/40773612

Using githash ec120ff:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60650

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.418
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.261
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.239
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.089
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
Training time 16:01:12

Result: The accuracy looks as expected. The training time can't be compared because we run on better hardware.

Augmentation: lsj + copypaste

Target Acc: 0.473 / 0.417 - time: 3 days, 19:00:54 - jobid: experiments/PR5825/22644

Using githash ec120ff:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 60654

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.456
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.656
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.496
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.309
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.493
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.360
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.574
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.604
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.755
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.630
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.433
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.433
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.585
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.537
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.353
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.698
Training time 3 days, 14:05:05

Result: The accuracy is significantly reduced. It's worth checking the implementations. The training time seems reduced.

datumbox avatar Sep 02 '22 09:09 datumbox

PIL Backend

Due to the bug discovered at #6541, we need to repeat the experiments that involve the LSJ.

Augmentation: lsj + copypaste

Target Acc: 0.473 / 0.417 - time: 3 days, 19:00:54 - jobid: experiments/PR5825/22644

Using githash 49e653f:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 61716

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.474
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.678
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.517
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.308
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.511
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.621
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.370
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.460
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.416
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.648
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.224
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.444
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.336
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.378
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.588
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.711
Training time 3 days, 14:04:04

Result: The accuracy looks as expected. The training time seems reduced.


To confirm there is no bug on AA, we will run the same tests on the current main branch and see if the results match.

Augmentation: aa + random erasing

Target Acc: 67.620 / 87.404 - time: 2 days, 17:09:20 - jobid: jobs/PR3354/classification/35753749

Using main:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 62122
Test:  Acc@1 66.904 Acc@5 86.898
Training time 1 day, 19:30:46

Submitted job_id: 62123
Test:  Acc@1 66.620 Acc@5 87.036
Training time 1 day, 17:31:04

Submitted job_id: 62124
Test:  Acc@1 66.644 Acc@5 86.540
Training time 1 day, 17:47:35

Result: The accuracy on main branch is exactly as the one reported on v2 above. The execution time varies a lot due to ontap performance so it's hard to make comparisons.

datumbox avatar Sep 06 '22 22:09 datumbox

@pmeier @vfdev-5 I think we just confirmed that the v2 produces the same model accuracy as v1. GGs! :) I don't think we can easily make arguments about speed because the jobs are affected a lot by the IO speed of OnTap. We use Victor's benchmarks instead.

datumbox avatar Sep 11 '22 10:09 datumbox

PIL Backend

Segmentation

Target Acc: 91.2 / 57.9 - time: 3:13:11 - jobid: jobs/PR3276/3rd_training/35354083

Using githash a2893a1:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 64765
global correct: 87.9
average row correct: ['93.1', '41.7', '54.4', '52.6', '35.2', '13.2', '68.4', '38.3', '76.9', '5.8', '86.2', '56.9', '35.6', '54.5', '80.8', '83.1', '33.5', '30.6', '42.4', '65.7', '38.6']
IoU: ['86.9', '29.5', '51.4', '26.7', '25.0', '11.0', '62.8', '34.6', '45.2', '5.5', '47.0', '32.1', '26.6', '45.2', '66.6', '71.3', '20.0', '27.2', '33.2', '55.5', '34.8']
mean IoU: 39.9
Training time 4:12:37

Submitted job_id: 64766
global correct: 88.0
average row correct: ['93.4', '40.9', '59.2', '49.2', '33.6', '2.5', '70.5', '39.6', '76.7', '3.1', '82.3', '55.6', '45.6', '53.0', '79.9', '83.2', '41.9', '39.5', '32.1', '66.8', '38.2']
IoU: ['87.0', '24.6', '55.4', '29.5', '24.7', '2.3', '66.4', '36.4', '47.5', '3.0', '43.8', '32.5', '35.5', '40.8', '68.5', '71.2', '27.2', '33.5', '26.1', '58.1', '28.4']
mean IoU: 40.1
Training time 2:16:02


Submitted job_id: 64768
global correct: 87.5
average row correct: ['92.9', '41.0', '53.5', '50.8', '34.7', '6.5', '63.5', '38.3', '78.1', '5.4', '75.1', '55.3', '45.6', '52.1', '76.2', '82.3', '32.7', '40.4', '35.2', '69.4', '42.4']
IoU: ['86.5', '33.2', '50.2', '34.1', '24.6', '5.3', '59.9', '34.1', '44.7', '5.1', '46.1', '30.3', '32.4', '43.6', '64.1', '70.4', '22.5', '33.9', '27.8', '51.9', '37.3']
mean IoU: 39.9
Training time 2:17:09

Result: There seems to be an issue with the accuracy; we need to investigate. The speed looks faster mostly because of new hardware. Our cluster has quite big variance on execution times possibly due to IO bandwidth.

datumbox avatar Sep 14 '22 21:09 datumbox

PIL Backend

Due to a discrepancy discovered on the padding of RandomCrop, we rerun the experiments:

Segmentation without pre-trained backbones

~~Target Acc: 91.2 / 57.9 - time: 3:13:11 - jobid: jobs/PR3276/3rd_training/35354083~~

Using githash 5a311b3:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 67061
global correct: 87.8
average row correct: ['93.3', '41.0', '47.7', '59.4', '34.9', '3.9', '67.9', '35.7', '74.4', '7.6', '86.4', '51.9', '41.7', '56.4', '80.9', '83.1', '31.8', '28.6', '38.3', '73.2', '36.9']
IoU: ['86.8', '24.6', '45.2', '35.5', '26.0', '3.4', '65.0', '31.9', '47.1', '7.2', '47.2', '29.8', '28.9', '47.4', '65.4', '71.2', '18.9', '25.1', '30.6', '60.5', '32.2']
mean IoU: 39.5
Training time 2:59:50

Submitted job_id: 67062
global correct: 87.7
average row correct: ['93.2', '36.0', '53.1', '48.2', '34.2', '3.8', '71.7', '38.3', '75.0', '5.4', '83.3', '54.9', '43.4', '57.6', '81.3', '82.1', '30.2', '23.3', '37.7', '65.0', '39.0']
IoU: ['86.7', '27.5', '50.4', '30.4', '24.0', '3.1', '65.7', '31.7', '46.7', '5.0', '52.3', '31.0', '28.2', '44.9', '68.9', '69.9', '21.0', '21.5', '29.7', '57.4', '36.7']
mean IoU: 39.7
Training time 3:06:06

Submitted job_id: 67063
global correct: 87.7
average row correct: ['93.1', '47.3', '47.4', '56.6', '28.8', '9.4', '76.5', '34.7', '76.6', '5.6', '83.5', '48.8', '38.7', '61.9', '79.5', '83.3', '36.7', '35.5', '39.7', '71.7', '35.9']
IoU: ['86.5', '26.2', '44.5', '39.5', '25.2', '8.3', '68.8', '31.8', '45.9', '5.1', '52.0', '28.0', '28.7', '52.9', '69.6', '71.0', '21.2', '31.5', '29.6', '58.0', '31.9']
mean IoU: 40.8
Training time 2:21:36

Running the same experiment using latest main 56e707bfccb62ada836d21e431d6db0d10dd73a1:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001
Submitted job_id: 67099
global correct: 87.8
average row correct: ['93.3', '43.3', '59.2', '48.7', '34.1', '2.1', '72.4', '38.6', '68.5', '8.2', '84.9', '52.7', '45.8', '50.9', '81.3', '83.1', '40.6', '20.7', '36.4', '68.4', '37.4']
IoU: ['86.8', '36.2', '54.4', '27.1', '25.2', '1.8', '65.3', '34.4', '41.4', '7.8', '44.8', '30.1', '28.8', '44.7', '69.6', '72.0', '27.4', '18.2', '29.2', '57.4', '30.5']
mean IoU: 39.7
Training time 2:11:46

Submitted job_id: 67100
global correct: 87.9
average row correct: ['93.4', '41.9', '50.3', '54.4', '31.0', '7.4', '70.3', '34.0', '72.6', '6.1', '74.2', '54.6', '48.8', '55.7', '80.9', '83.0', '31.6', '33.0', '30.8', '67.0', '34.0']
IoU: ['86.8', '32.2', '47.6', '33.0', '26.4', '6.4', '65.4', '31.8', '48.6', '5.7', '40.3', '31.3', '29.4', '45.1', '69.3', '71.0', '21.4', '30.5', '24.7', '56.0', '31.3']
mean IoU: 39.7
Training time 2:17:05

Submitted job_id: 67101
global correct: 87.7
average row correct: ['93.0', '37.0', '47.2', '54.4', '32.1', '2.3', '67.1', '38.3', '77.9', '11.3', '85.0', '56.3', '38.1', '52.8', '77.5', '83.1', '32.3', '32.3', '40.5', '72.0', '35.0']
IoU: ['86.7', '31.8', '44.8', '36.2', '23.0', '1.8', '63.1', '34.7', '51.9', '10.5', '41.0', '31.5', '24.7', '40.5', '67.1', '70.6', '24.1', '27.1', '30.7', '58.0', '31.1']
mean IoU: 39.6
Training time 2:16:45

Result: As we can see both main and the prototype produce identical results. The reason the accuracy is reduced is because we don't use pre-trained backbones.


Segmentation with pre-trained backbones

Using githash 5a311b3:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67310
global correct: 90.4
average row correct: ['94.1', '68.9', '63.6', '79.0', '36.9', '29.5', '81.9', '57.5', '82.8', '30.7', '90.0', '57.6', '79.1', '80.8', '86.9', '88.4', '36.7', '67.7', '57.2', '80.6', '59.7']
IoU: ['89.2', '53.1', '59.3', '68.7', '30.9', '24.3', '76.8', '49.7', '71.8', '25.5', '59.7', '36.1', '59.3', '58.7', '72.6', '78.7', '25.6', '57.4', '38.1', '54.0', '51.7']
mean IoU: 54.3
Training time 2:19:32

Submitted job_id: 67311
global correct: 90.7
average row correct: ['94.4', '70.9', '55.5', '80.1', '37.6', '26.7', '85.8', '58.1', '83.2', '28.1', '89.5', '58.7', '77.0', '80.7', '86.8', '88.1', '29.3', '71.3', '62.6', '81.3', '63.2']
IoU: ['89.5', '54.1', '52.4', '72.9', '32.5', '22.7', '79.8', '50.5', '69.2', '23.6', '61.7', '37.8', '57.2', '58.8', '72.9', '78.8', '21.2', '58.5', '40.5', '56.5', '54.2']
mean IoU: 54.5
Training time 2:39:48

Submitted job_id: 67312
global correct: 90.7
average row correct: ['94.3', '72.0', '59.0', '78.3', '37.3', '28.6', '85.3', '57.8', '83.5', '28.8', '90.2', '60.7', '79.3', '76.9', '85.2', '87.8', '38.0', '72.2', '70.9', '79.8', '64.2']
IoU: ['89.5', '54.0', '56.0', '69.3', '30.8', '24.0', '78.0', '50.5', '67.9', '24.1', '60.3', '38.5', '64.6', '56.7', '72.6', '78.1', '27.4', '60.8', '47.6', '55.2', '60.1']
mean IoU: 55.5
Training time 2:25:39

Running the same experiment using latest main 56e707bfccb62ada836d21e431d6db0d10dd73a1:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67340
global correct: 90.2
average row correct: ['93.8', '73.1', '61.8', '77.1', '37.2', '27.4', '84.0', '60.1', '85.0', '28.9', '90.2', '60.9', '77.5', '75.0', '86.8', '88.0', '33.4', '68.1', '56.6', '80.1', '61.2']
IoU: ['88.9', '54.5', '58.3', '70.3', '32.3', '23.3', '78.6', '52.0', '72.0', '24.0', '59.0', '35.9', '59.0', '56.7', '73.0', '78.5', '24.4', '55.8', '37.3', '56.3', '56.4']
mean IoU: 54.6
Training time 2:17:01

Submitted job_id: 67341
global correct: 90.4
average row correct: ['93.9', '67.0', '61.8', '77.8', '48.7', '25.4', '85.6', '59.1', '84.8', '31.0', '90.3', '60.5', '79.7', '80.0', '86.8', '88.7', '38.9', '66.0', '55.2', '81.3', '63.4']
IoU: ['89.2', '50.5', '58.2', '71.1', '41.6', '22.0', '79.4', '51.0', '70.4', '25.1', '58.3', '36.9', '63.0', '59.8', '71.2', '78.8', '27.6', '55.0', '36.6', '54.0', '55.2']
mean IoU: 55.0
Training time 2:14:00

Submitted job_id: 67342
global correct: 90.4
average row correct: ['94.0', '70.4', '67.0', '76.9', '38.2', '22.3', '86.6', '59.8', '83.8', '29.2', '90.2', '58.7', '79.2', '80.1', '83.8', '88.6', '31.3', '69.7', '55.2', '79.1', '65.0']
IoU: ['89.1', '51.9', '62.2', '71.4', '31.8', '19.7', '82.5', '51.4', '69.8', '24.8', '61.0', '35.5', '59.6', '59.2', '72.0', '78.8', '22.0', '58.9', '38.3', '55.4', '58.4']
mean IoU: 54.9
Training time 2:14:29

Result: As we can see the accuracies are comparable. The speed is slightly reduced but in these experiments we are heavily affected by IO bandwidth as the cluster is not isolated. We should follow up with isolated benchmarks to measure accurately. We also observe that the end accuracy in both branch is lower than the historical value. Since the issue is observed also in main this is not related to Transforms V2 but rather to a regression we will investigate separately.

We therefore conclude that all transforms provide equivalent accuracies across Image Classification, Detection and Segmentation. The speed seems slightly worse on V2, which is expected. We will now work on improving the performance of the Transforms and on adding Video support.

datumbox avatar Sep 21 '22 13:09 datumbox

Tensor Backend + antialias=True

Compared against PIL backend ran previously [1, 2, 3, 4].

For the Video support, check https://github.com/pytorch/vision/pull/6433#issuecomment-1274585805.

Classification

Augmentation: ta_wide + random erasing + mixup + cutmix

Target Acc: 80.668 / 95.258 - time: 1 day, 23:14:27 - jobid: experiments/PR6433/59857

Using githash 6ef4d82:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 68029
Test: EMA Acc@1 80.626 Acc@5 95.310
Training time 2 days, 6:04:18

Result: Comparable accuracy between backends. The speed seems reduced significantly, we need to investigate more on the benchmarks.

Augmentation: aa + random erasing

Target Acc: 66.904 / 86.898 - time: 1 day, 19:30:46 - jobid: experiments/PR6433/62122

Using githash 6ef4d82:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
Submitted job_id: 68030
Test:  Acc@1 66.044 Acc@5 86.338
Training time 2 days, 4:55:57

Result: The accuracy is lower but is within the bands of randomness (see previous runs) above, so no concerns over accuracy. The speed seems reduced significantly, we need to investigate more on the benchmarks.

Detection

Augmentation: multiscale

Target Acc: 0.414 - time: 9:54:21 - jobid: experiments/PR6433/60097

Using githash e9c480e:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67794

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.415
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.618
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.440
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.266
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.453
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.541
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.338
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.546
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.588
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.419
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731
Training time 9:49:04

Result: Comparable accuracy, comparable speed. The two backends work the same.

Augmentation: ssdlite

Target Acc: 0.210 - time: 1 day, 16:39:09 - jobid: experiments/PR6433/60972

Using githash e9c480e:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67795

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.212
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.222
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.446
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.332
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.044
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.339
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.648
Training time 1 day, 16:06:26

Result: Comparable accuracy, comparable speed. The two backends work the same.

Augmentation: ssd

Target Acc: 0.252 - time: 16:01:12 - jobid: experiments/PR6433/60650

Using githash e9c480e:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67796

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.252
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.417
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.262
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.057
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.239
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.366
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.092
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599
Training time 17:28:42

Result: Comparable accuracy across backends. The speed looks lower but the training is heavily affected by IO bandwidth on the cluster so we need to do further checks to confirm.

Augmentation: lsj + copypaste

Target Acc: 0.474 / 0.416 - time: 3 days, 14:04:04 - jobid: experiments/PR6433/61716

Using githash e9c480e:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
Submitted job_id: 67791

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.474
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.680
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.518
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.304
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.370
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.589
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.618
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.444
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.654
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.770
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.416
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.650
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.446
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.221
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.442
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.608
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.337
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.549
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.362
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.714
Training time 3 days, 15:09:55

Result: Comparable accuracy, comparable speed. The two backends work the same.

Segmentation

Target Acc: 90.7 / 55.5 - time: 2:25:39 - jobid: experiments/PR6433/67312

Using githash e9c480e:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
Submitted job_id: 67797

global correct: 90.5
average row correct: ['94.0', '72.6', '63.3', '78.6', '37.8', '25.2', '86.0', '58.7', '83.8', '29.4', '90.6', '61.8', '77.3', '77.8', '85.4', '87.9', '39.2', '71.7', '57.7', '80.1', '62.9']
IoU: ['89.3', '53.4', '58.6', '71.0', '29.6', '21.8', '80.8', '51.0', '69.5', '24.0', '60.8', '38.1', '57.7', '58.6', '71.5', '78.5', '27.2', '61.1', '36.6', '56.3', '54.2']
mean IoU: 54.7
Training time 2:20:45

Result: Comparable accuracy, comparable speed. The two backends work the same.

We therefore conclude that all transforms provide equivalent accuracies across Image Classification, Detection and Segmentation across the 2 backends. The speed seems worse in the Tensor backend in some cases but the work that @vfdev-5 is currently doing on improving the performance should help close the gap.

datumbox avatar Sep 23 '22 22:09 datumbox

Video

Original recipe

Target Acc: 68.368 / 88.050 - time: 2 days, 13:07:05 - jobid: experiments/PR6412/59255

Using githash 669b1ba:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --train-resize-size 256 256 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
Submitted job_id: 71369
Training time 2 days, 12:12:11

Validated using 128 clip lengths as the original model:

trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume ~/experiments/PR6433/71369/model_44.pth
* Clip Acc@1 68.333 Clip Acc@5 87.980

Result: Comparable accuracy, comparable speed. This confirms that our transforms work as expected for Video.

New Recipe

Using githash d5f1532:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --auto-augment ta_wide --mixup-alpha 0.8 --cutmix-alpha 1.0 --random-erase 0.25 --train-resize-size 256 320 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
Submitted job_id: 72508
Training time 3 days, 6:00:05

Validated using 128 clip lengths as the original model:

trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume ~/experiments/PR6433/72508/model_42.pth
 * Clip Acc@1 70.903 Clip Acc@5 90.434

Result: Improved accuracy with the new recipe. The Augmentations works as expected for Video.

datumbox avatar Oct 11 '22 12:10 datumbox

Tensor Backend + antialias=True

Reverifying the new API after the speed optimizations. Reference runs at https://github.com/pytorch/vision/pull/6433#issuecomment-1256741233 and https://github.com/pytorch/vision/pull/6433#issuecomment-1274585805

Classification

Augmentation: ta_wide + random erasing + mixup + cutmix

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 80.626 / 95.310 - time: 2 days, 6:04:18 - jobid: experiments/PR6433/68029
Submitted job_id: 75703
Test: EMA Acc@1 80.862 Acc@5 95.476
Training time 2 days, 0:12:39

Result: Similar accuracy, 11% faster than unoptimized V2.

Augmentation: aa + random erasing

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 66.044 / 86.338 - time: 2 days, 4:55:57 - jobid: experiments/PR6433/68030
Submitted job_id: 75704
Test:  Acc@1 67.146 Acc@5 87.086
Training time 1 day, 22:05:18

Result: Similar accuracy (improvement not statistically significant), 13% faster than unoptimized V2.

Detection

Augmentation: multiscale

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.415 - time: 9:49:04 - jobid: experiments/PR6433/67794
Submitted job_id: 75705
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.413
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.615
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.437
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.273
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.456
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.337
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.585
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.625
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719
Training time 9:46:33

Result: Similar accuracy and speed.

Augmentation: ssdlite

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.212 - time: 1 day, 16:06:26 - jobid: experiments/PR6433/67795
Submitted job_id: 75706
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.218
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.338
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
Training time 1 day, 14:44:39

Result: Similar accuracy, 8% faster than unoptimized V2.

Augmentation: ssd

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.252 - time: 17:28:42 - jobid: experiments/PR6433/67796
Submitted job_id: 75707
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.254
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.264
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.238
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599
Training time 16:43:37

Result: Similar accuracy, 4% faster than unoptimized V2.

Augmentation: lsj + copypaste

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.474 / 0.416 - time: 3 days, 15:09:55 - jobid: experiments/PR6433/67791
Submitted job_id: 75998
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.480
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.682
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.526
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.621
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.447
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.419
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.655
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.452
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.447
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.336
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.589
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.709
Training time 3 days, 14:07:56

Result: Similar accuracy and speed.

Segmentation

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
# V2 Target Acc: 90.5 / 54.7 - time: 2:20:45 - jobid: experiments/PR6433/67797
Submitted job_id: 75997
global correct: 90.7
average row correct: ['94.8', '72.7', '63.3', '79.2', '43.5', '28.8', '84.4', '60.7', '84.2', '28.4', '90.5', '52.2', '80.7', '74.2', '85.6', '88.2', '33.3', '67.6', '55.9', '77.9', '59.0']
IoU: ['89.6', '52.9', '59.2', '69.2', '38.5', '24.2', '79.2', '52.2', '71.6', '23.2', '56.8', '35.8', '62.3', '56.3', '72.0', '78.3', '23.1', '57.7', '37.3', '54.6', '54.7']
mean IoU: 54.7
Training time 2:20:49 

Result: Similar accuracy and speed.

Video

New Recipe

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --auto-augment ta_wide --mixup-alpha 0.8 --cutmix-alpha 1.0 --random-erase 0.25 --train-resize-size 256 320 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
# V2 Target Acc: 70.903 / 90.434 - time: 3 days, 6:00:05 - jobid: experiments/PR6433/72508
Submitted job_id: 75701
Training time 3 days, 3:42:15
trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume experiments/PR6433/75701/model_40.pth
 * Clip Acc@1 71.134 Clip Acc@5 90.486

Result: Similar accuracy, the speed is faster. It's a bit harder to estimate improvement because the logs indicate IO slowdown caused by OnTap during at least 3 epochs. The new version is consistently 6-7 minutes per epoch faster than the old, which translates to roughly 7-8% improvement.

datumbox avatar Nov 04 '22 16:11 datumbox