DALI icon indicating copy to clipboard operation
DALI copied to clipboard

why the speed of any loader in ["pytorch", "dali", "dali_proxy"] is the same in imagenet val dataset ?

Open BiaoBiaoLi opened this issue 6 months ago • 3 comments

Describe the question.

Thanks for your work !! I meet a problem, could you please help me ? I use the resnet example in pytorch case, i test the three mode of ["pytorch", "dali", "dali_proxy"] for imagenet val dataset. But i get the same speed. How can i imporve the dataloader speed, thanks you. I use the command below, and bs=256, works_num = 16, and record val time like this if args.evaluate: start_time = time.time() validate(val_loader, model, criterion) print(args.data_loader, " eval spend time ", time.time() - start_time) return

  1. python resnet50_dali_main_lastest.py -a resnet50 --pretrained -e /cache/ILSVRC_2012/ --data_loader dali the log is below: Test: [0/196] Time 2.897 (2.897) Speed 88.374 (88.374) Loss 0.4836 (0.4836) Prec@1 85.938 (85.938) Prec@5 97.656 (97.656) Test: [10/196] Time 0.254 (0.492) Speed 1006.163 (520.375) Loss 0.9584 (0.6698) Prec@1 77.734 (82.919) Prec@5 92.969 (95.526) Test: [20/196] Time 0.252 (0.377) Speed 1014.990 (678.508) Loss 0.7611 (0.6834) Prec@1 86.328 (82.906) Prec@5 92.188 (95.387) Test: [30/196] Time 0.248 (0.338) Speed 1034.300 (756.694) Loss 0.7936 (0.6397) Prec@1 80.469 (84.098) Prec@5 95.312 (95.779) Test: [40/196] Time 0.248 (0.317) Speed 1034.085 (806.879) Loss 0.6592 (0.6858) Prec@1 82.031 (82.346) Prec@5 96.094 (95.741) Test: [50/196] Time 0.247 (0.304) Speed 1037.235 (841.413) Loss 0.4785 (0.6842) Prec@1 88.672 (82.169) Prec@5 97.656 (95.856) Test: [60/196] Time 0.247 (0.296) Speed 1034.767 (864.790) Loss 0.9261 (0.6984) Prec@1 76.172 (81.737) Prec@5 95.703 (95.927) Test: [70/196] Time 0.237 (0.290) Speed 1081.450 (881.430) Loss 0.7178 (0.6850) Prec@1 78.906 (81.976) Prec@5 96.484 (96.044) Test: [80/196] Time 0.250 (0.285) Speed 1022.285 (897.980) Loss 1.4714 (0.7118) Prec@1 62.109 (81.481) Prec@5 88.281 (95.727) Test: [90/196] Time 0.252 (0.281) Speed 1014.157 (910.917) Loss 1.8526 (0.7604) Prec@1 57.031 (80.426) Prec@5 85.547 (95.240) Test: [100/196] Time 0.247 (0.278) Speed 1035.940 (921.633) Loss 1.1296 (0.8124) Prec@1 67.578 (79.247) Prec@5 91.797 (94.663) Test: [110/196] Time 0.248 (0.275) Speed 1030.350 (930.323) Loss 0.8505 (0.8367) Prec@1 76.953 (78.709) Prec@5 94.531 (94.436) Test: [120/196] Time 0.240 (0.273) Speed 1065.760 (937.727) Loss 1.2352 (0.8546) Prec@1 70.703 (78.416) Prec@5 87.891 (94.131) Test: [130/196] Time 0.244 (0.271) Speed 1048.608 (944.158) Loss 0.6998 (0.8883) Prec@1 80.859 (77.582) Prec@5 96.484 (93.825) Test: [140/196] Time 0.248 (0.270) Speed 1031.553 (949.805) Loss 1.0393 (0.9064) Prec@1 74.609 (77.261) Prec@5 92.188 (93.628) Test: [150/196] Time 0.245 (0.268) Speed 1044.525 (954.837) Loss 1.0463 (0.9237) Prec@1 75.391 (76.932) Prec@5 89.844 (93.375) Test: [160/196] Time 0.250 (0.267) Speed 1025.097 (958.618) Loss 0.6940 (0.9378) Prec@1 85.938 (76.650) Prec@5 94.141 (93.163) Test: [170/196] Time 0.250 (0.266) Speed 1025.405 (961.998) Loss 0.6075 (0.9544) Prec@1 81.641 (76.222) Prec@5 98.047 (92.998) Test: [180/196] Time 0.253 (0.265) Speed 1012.474 (965.348) Loss 1.2973 (0.9702) Prec@1 69.531 (75.898) Prec@5 92.188 (92.865) Test: [190/196] Time 0.259 (0.265) Speed 988.305 (967.457) Loss 1.1865 (0.9687) Prec@1 66.797 (75.881) Prec@5 94.922 (92.893)
  • Prec@1 75.990 Prec@5 92.922 dali eval spend time 52.22550344467163
  1. python resnet50_dali_main_lastest.py -a resnet50 --pretrained -e /cache/ILSVRC_2012/ --data_loader pytorch the log is below Test: [0/196] Time 10.827 (10.827) Speed 23.646 (23.646) Loss 0.4835 (0.4835) Prec@1 85.938 (85.938) Prec@5 98.047 (98.047) Test: [10/196] Time 0.215 (1.180) Speed 1188.818 (216.867) Loss 0.9643 (0.6704) Prec@1 78.125 (83.132) Prec@5 93.359 (95.703) Test: [20/196] Time 0.216 (0.721) Speed 1187.119 (355.045) Loss 0.7694 (0.6839) Prec@1 85.938 (82.999) Prec@5 92.188 (95.406) Test: [30/196] Time 0.215 (0.558) Speed 1188.629 (458.804) Loss 0.7975 (0.6403) Prec@1 81.641 (84.236) Prec@5 94.531 (95.741) Test: [40/196] Time 0.216 (0.475) Speed 1186.597 (538.887) Loss 0.6590 (0.6857) Prec@1 82.031 (82.584) Prec@5 97.656 (95.770) Test: [50/196] Time 0.216 (0.424) Speed 1186.812 (603.153) Loss 0.4781 (0.6839) Prec@1 89.453 (82.338) Prec@5 97.266 (95.918) Test: [60/196] Time 0.216 (0.391) Speed 1185.402 (654.856) Loss 0.9100 (0.6974) Prec@1 76.562 (81.865) Prec@5 94.922 (95.959) Test: [70/196] Time 0.221 (0.367) Speed 1158.536 (698.258) Loss 0.7313 (0.6844) Prec@1 78.906 (82.130) Prec@5 96.094 (96.072) Test: [80/196] Time 0.216 (0.348) Speed 1185.936 (735.491) Loss 1.4632 (0.7108) Prec@1 62.109 (81.573) Prec@5 87.891 (95.737) Test: [90/196] Time 0.216 (0.334) Speed 1187.928 (766.888) Loss 1.8344 (0.7582) Prec@1 55.859 (80.473) Prec@5 87.109 (95.252) Test: [100/196] Time 0.216 (0.322) Speed 1185.767 (794.051) Loss 1.1352 (0.8107) Prec@1 67.188 (79.339) Prec@5 91.406 (94.678) Test: [110/196] Time 0.221 (0.313) Speed 1160.339 (818.188) Loss 0.8528 (0.8348) Prec@1 78.125 (78.832) Prec@5 94.141 (94.429) Test: [120/196] Time 0.216 (0.305) Speed 1186.894 (839.556) Loss 1.2527 (0.8528) Prec@1 70.312 (78.503) Prec@5 85.938 (94.105) Test: [130/196] Time 0.216 (0.298) Speed 1186.192 (858.536) Loss 0.7043 (0.8863) Prec@1 81.641 (77.630) Prec@5 96.094 (93.792) Test: [140/196] Time 0.216 (0.292) Speed 1186.622 (875.715) Loss 1.0336 (0.9043) Prec@1 75.391 (77.313) Prec@5 91.406 (93.578) Test: [150/196] Time 0.216 (0.287) Speed 1187.178 (891.155) Loss 1.0479 (0.9217) Prec@1 75.000 (77.005) Prec@5 89.844 (93.328) Test: [160/196] Time 0.216 (0.283) Speed 1187.820 (904.935) Loss 0.7137 (0.9361) Prec@1 86.328 (76.732) Prec@5 94.531 (93.107) Test: [170/196] Time 0.215 (0.279) Speed 1188.772 (917.694) Loss 0.6185 (0.9530) Prec@1 83.984 (76.341) Prec@5 97.656 (92.941) Test: [180/196] Time 0.216 (0.275) Speed 1187.355 (929.386) Loss 1.2993 (0.9690) Prec@1 67.969 (76.023) Prec@5 92.578 (92.803) Test: [190/196] Time 0.215 (0.272) Speed 1188.892 (940.125) Loss 1.1717 (0.9671) Prec@1 69.531 (76.043) Prec@5 96.094 (92.834)
  • Prec@1 76.146 Prec@5 92.872 pytorch eval spend time 53.391836643218994
  1. python resnet50_dali_main_lastest.py -a resnet50 --pretrained -e /cache/ILSVRC_2012/ --data_loader dali_proxy , the log is below Test: [0/196] Time 6.960 (6.960) Speed 36.779 (36.779) Loss 0.4836 (0.4836) Prec@1 85.938 (85.938) Prec@5 97.656 (97.656) Test: [10/196] Time 0.218 (0.829) Speed 1172.853 (308.770) Loss 0.9584 (0.6698) Prec@1 77.734 (82.919) Prec@5 92.969 (95.526) Test: [20/196] Time 0.220 (0.537) Speed 1165.981 (476.820) Loss 0.7611 (0.6834) Prec@1 86.328 (82.906) Prec@5 92.188 (95.387) Test: [30/196] Time 0.216 (0.433) Speed 1187.152 (590.872) Loss 0.7936 (0.6397) Prec@1 80.469 (84.098) Prec@5 95.312 (95.779) Test: [40/196] Time 0.254 (0.384) Speed 1008.628 (666.967) Loss 0.6592 (0.6858) Prec@1 82.031 (82.346) Prec@5 96.094 (95.741) Test: [50/196] Time 0.250 (0.358) Speed 1025.921 (715.786) Loss 0.4785 (0.6842) Prec@1 88.672 (82.169) Prec@5 97.656 (95.856) Test: [60/196] Time 0.254 (0.340) Speed 1009.215 (752.921) Loss 0.9261 (0.6984) Prec@1 76.172 (81.737) Prec@5 95.703 (95.927) Test: [70/196] Time 0.245 (0.328) Speed 1043.773 (781.475) Loss 0.7178 (0.6850) Prec@1 78.906 (81.976) Prec@5 96.484 (96.044) Test: [80/196] Time 0.244 (0.319) Speed 1050.082 (803.077) Loss 1.4714 (0.7118) Prec@1 62.109 (81.481) Prec@5 88.281 (95.727) Test: [90/196] Time 0.266 (0.311) Speed 961.770 (822.401) Loss 1.8526 (0.7604) Prec@1 57.031 (80.426) Prec@5 85.547 (95.240) Test: [100/196] Time 0.246 (0.305) Speed 1042.286 (839.548) Loss 1.1296 (0.8124) Prec@1 67.578 (79.247) Prec@5 91.797 (94.663) Test: [110/196] Time 0.250 (0.300) Speed 1022.187 (853.296) Loss 0.8505 (0.8367) Prec@1 76.953 (78.709) Prec@5 94.531 (94.436) Test: [120/196] Time 0.246 (0.296) Speed 1038.699 (864.750) Loss 1.2352 (0.8546) Prec@1 70.703 (78.416) Prec@5 87.891 (94.131) Test: [130/196] Time 0.248 (0.292) Speed 1032.094 (875.422) Loss 0.6998 (0.8883) Prec@1 80.859 (77.582) Prec@5 96.484 (93.825) Test: [140/196] Time 0.248 (0.289) Speed 1033.510 (884.785) Loss 1.0393 (0.9064) Prec@1 74.609 (77.261) Prec@5 92.188 (93.628) Test: [150/196] Time 0.246 (0.287) Speed 1040.358 (893.269) Loss 1.0463 (0.9237) Prec@1 75.391 (76.932) Prec@5 89.844 (93.375) Test: [160/196] Time 0.248 (0.284) Speed 1030.435 (900.244) Loss 0.6940 (0.9378) Prec@1 85.938 (76.650) Prec@5 94.141 (93.163) Test: [170/196] Time 0.247 (0.282) Speed 1035.776 (906.897) Loss 0.6075 (0.9544) Prec@1 81.641 (76.222) Prec@5 98.047 (92.998) Test: [180/196] Time 0.248 (0.280) Speed 1030.365 (913.106) Loss 1.2973 (0.9702) Prec@1 69.531 (75.898) Prec@5 92.188 (92.865) Test: [190/196] Time 0.257 (0.279) Speed 994.746 (917.857) Loss 1.1865 (0.9687) Prec@1 66.797 (75.881) Prec@5 94.922 (92.893)
  • Prec@1 75.990 Prec@5 92.922 dali_proxy eval spend time 54.75585222244263

Check for duplicates

  • [x] I have searched the open bugs/issues and have found no duplicates for this bug report

BiaoBiaoLi avatar Jul 02 '25 03:07 BiaoBiaoLi

Hi @BiaoBiaoLi,

Thank you for reaching out.

I recommend checking the synthetic data_loader to see how fast you can train your network without any data processing overhead. You can read about this approach here.

If there is no significant difference, it means that accelerating your data loading won't help. However, if there is a difference, you can try capturing an NSight system profile to identify the bottleneck. If you find that you are blocked by I/O when reading samples from the drive, you might need either faster storage or more RAM to allow your OS to cache the data.

JanuszL avatar Jul 02 '25 05:07 JanuszL

thanks your help. I will test it without any data processing. and i want to know if the parallel = true in https://github.com/NVIDIA/DALI/issues/3191 can help me load image faster ?

BiaoBiaoLi avatar Jul 02 '25 06:07 BiaoBiaoLi

@BiaoBiaoLi - in this example I don't believe we use external source, so this option doesn't apply. As I suggested please start with checking if you are truly bottlenecked by the data loading/processing.

JanuszL avatar Jul 02 '25 06:07 JanuszL