models icon indicating copy to clipboard operation
models copied to clipboard

[Test] Dygraph DataLoader Phased Optimization

Open chenwhql opened this issue 5 years ago • 1 comments

动态图DataLoader这段时间进行了两次优化:

  • 优化1:https://github.com/PaddlePaddle/Paddle/pull/21634
    • 去掉了一些原DataLoader不合理的实现,个人测试ResNet整体训练提速6.2%(相对于使用优化前DataLoader)
  • 优化2:https://github.com/PaddlePaddle/Paddle/pull/21762
    • 用子进程加速数据的载入过程,个人测试ResNet累计整体提速32.2%(相对于使用优化前DataLoader)

目前这两次优化的PR均已Merge到develop,现在根据最新的代码对这两次优化进行整体效果测试(验证结果以本次测试为准)。

测试方法:

  1. 拉取models repo,然后拉取本PR所在分支至当前models repo(本人测试的models版本号:109a3c7,如果有冲突,可以考虑切换到此分支,或手动解决)
  2. 返回dygraph目录下,执行dataloader_test.sh,等待测试结果
  3. 执行parse_dataloader_test_result.py,将结果输出至终端,对比分析

个人测试过程概述:

  1. 基于models/dygraph下面mnist, resnet, se_resnet, transformer共4个模型进行测试
  2. 将上述所有模型代码中epoch数改为1,缩短测试时间,其他参数保持不变
  3. 在上述所有模型中加入记录train部分时间的测试代码,累计在末尾打印
  4. 对于上述模型,基于原来的train.py,复制得到train_sp.py, train_mp.py
  • train_sp.py, train_mp.py均将数据载入方式改为使用DataLoader
  • train_sp.py使用 优化1 之后的DataLoader
  • train_mp.py使用 优化2 之后的DataLoader(use_multiprocess=True)
  1. 统一对上述几个模型的单epoch,单卡8卡进行测试,日志存储到本地
  2. 从本地日志中查看测试结果

个人测试数据说明:

  • 开发机:yq01-gpu-255-137-12-00,8卡P40
  • CMAKE指令:cmake .. -DPY_VERSION=3.5 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=ON -DCUDA_ARCH_NAME=Auto -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON
  • GCC版本:5.4.0
  • 数据对应关系
    • reader列数据 - train.py
    • 单进程DataLoader列数据 - train_sp.py
    • 多进程DataLoader列数据 - train_mp.py
  • 补充说明
    • 目前的主要优化是将数据从磁盘载入的过程写入多进程,但创建线程和进程也会引入开销
    • 本次优化仅对的CV类模型训练速度提升比较明显,NLP模型读入数据负载较小,所以没有明显优化效果,NLP模型建议暂时不使用多进程模式,后续仍会继续优化
    • 表格统计数据保留小数点后3位,四舍五入
    • 多卡测试数据统计的是0卡的数据

1. 单卡测试数据(单位s)

模型 reader 单进程DataLoader 多进程DataLoader (相对reader)
mnist 12.000 11.432 9.448 (-21.3%)
resnet 94.407 83.520 63.586 (-32.6%)
se_resnext 180.853 134.58 128.874 (-28.7%)
transformer 93.559 93.331 93.000 (-0.5%)

2. 8卡测试数据(单位s)

模型 reader 单进程DataLoader 多进程DataLoader (相对reader)
mnist 6.367 7.272 5.971 (-6.2%)
resnet 57.158 55.931 51.899 (-9.2%)
se_resnext 67.845 62.330 54.590 (-19.5%)
transformer 22.807 24.197 23.726 (+4%)

注意

  • 检查当前GPU的环境变量,建议设置为8卡,CUDA_VISLBLE_DEVICES=0,1,2,3,4,5,6,7
  • 确认当前GPU没有被其他任务占用
  • 确认当前CPU也没有被比较重的任务占用

附录 - 原始测试结果

λ yq01-gpu-255-137-12-00 /work/models/dygraph {develop} python parse_dataloader_test_result.py
./dataloader_test_log/mnist - total train time: 12.000037908554077 s
./dataloader_test_log/resnet - total train time: 94.40757536888123 s
./dataloader_test_log/se_resnet - total train time: 180.8534700870514 s
./dataloader_test_log/transformer - total train time: 93.55937242507935 s
./dataloader_test_log/mnist_sp - total train time: 11.432420492172241 s
./dataloader_test_log/resnet_sp - total train time: 83.5201735496521 s
./dataloader_test_log/se_resnet_sp - total train time: 134.5808563232422 s
./dataloader_test_log/transformer_sp - total train time: 93.33124494552612 s
./dataloader_test_log/mnist_mp - total train time: 9.447554111480713 s
./dataloader_test_log/resnet_mp - total train time: 63.58643341064453 s
./dataloader_test_log/se_resnet_mp - total train time: 128.87386989593506 s
./dataloader_test_log/transformer_mp - total train time: 93.00028419494629 s
./dataloader_test_log/mnist_8/workerlog.0 - total train time: 6.367103576660156 s
./dataloader_test_log/mnist_8/workerlog.1 - total train time: 6.469170331954956 s
./dataloader_test_log/mnist_8/workerlog.2 - total train time: 6.326692581176758 s
./dataloader_test_log/mnist_8/workerlog.3 - total train time: 6.332724571228027 s
./dataloader_test_log/mnist_8/workerlog.4 - total train time: 6.324522018432617 s
./dataloader_test_log/mnist_8/workerlog.5 - total train time: 6.312472343444824 s
./dataloader_test_log/mnist_8/workerlog.6 - total train time: 6.3679540157318115 s
./dataloader_test_log/mnist_8/workerlog.7 - total train time: 6.320514440536499 s
./dataloader_test_log/resnet_8/workerlog.0 - total train time: 57.158374071121216 s
./dataloader_test_log/resnet_8/workerlog.1 - total train time: 57.0883584022522 s
./dataloader_test_log/resnet_8/workerlog.2 - total train time: 57.085835218429565 s
./dataloader_test_log/resnet_8/workerlog.3 - total train time: 57.082090854644775 s
./dataloader_test_log/resnet_8/workerlog.4 - total train time: 57.22903871536255 s
./dataloader_test_log/resnet_8/workerlog.5 - total train time: 57.0818076133728 s
./dataloader_test_log/resnet_8/workerlog.6 - total train time: 56.99318337440491 s
./dataloader_test_log/resnet_8/workerlog.7 - total train time: 57.07746934890747 s
./dataloader_test_log/se_resnet_8/workerlog.0 - total train time: 67.84486436843872 s
./dataloader_test_log/se_resnet_8/workerlog.1 - total train time: 67.85854125022888 s
./dataloader_test_log/se_resnet_8/workerlog.2 - total train time: 67.86838150024414 s
./dataloader_test_log/se_resnet_8/workerlog.3 - total train time: 67.84253692626953 s
./dataloader_test_log/se_resnet_8/workerlog.4 - total train time: 67.87079882621765 s
./dataloader_test_log/se_resnet_8/workerlog.5 - total train time: 67.87530589103699 s
./dataloader_test_log/se_resnet_8/workerlog.6 - total train time: 67.89710235595703 s
./dataloader_test_log/se_resnet_8/workerlog.7 - total train time: 67.49056339263916 s
./dataloader_test_log/transformer_8/workerlog.0 - total train time: 22.80716848373413 s
./dataloader_test_log/transformer_8/workerlog.1 - total train time: 22.8052077293396 s
./dataloader_test_log/transformer_8/workerlog.2 - total train time: 22.810211420059204 s
./dataloader_test_log/transformer_8/workerlog.3 - total train time: 23.049880027770996 s
./dataloader_test_log/transformer_8/workerlog.4 - total train time: 22.928219318389893 s
./dataloader_test_log/transformer_8/workerlog.5 - total train time: 22.99123525619507 s
./dataloader_test_log/transformer_8/workerlog.6 - total train time: 22.811274528503418 s
./dataloader_test_log/transformer_8/workerlog.7 - total train time: 22.81301498413086 s
./dataloader_test_log/mnist_8_sp/workerlog.0 - total train time: 7.271549224853516 s
./dataloader_test_log/mnist_8_sp/workerlog.1 - total train time: 7.24362587928772 s
./dataloader_test_log/mnist_8_sp/workerlog.2 - total train time: 7.29660964012146 s
./dataloader_test_log/mnist_8_sp/workerlog.3 - total train time: 7.2771148681640625 s
./dataloader_test_log/mnist_8_sp/workerlog.4 - total train time: 7.337551116943359 s
./dataloader_test_log/mnist_8_sp/workerlog.5 - total train time: 7.274338245391846 s
./dataloader_test_log/mnist_8_sp/workerlog.6 - total train time: 7.2760045528411865 s
./dataloader_test_log/mnist_8_sp/workerlog.7 - total train time: 7.299149990081787 s
./dataloader_test_log/resnet_8_sp/workerlog.0 - total train time: 55.93130087852478 s
./dataloader_test_log/resnet_8_sp/workerlog.1 - total train time: 56.52779817581177 s
./dataloader_test_log/resnet_8_sp/workerlog.2 - total train time: 56.52779531478882 s
./dataloader_test_log/resnet_8_sp/workerlog.3 - total train time: 56.63466262817383 s
./dataloader_test_log/resnet_8_sp/workerlog.4 - total train time: 56.64422035217285 s
./dataloader_test_log/resnet_8_sp/workerlog.5 - total train time: 56.527549743652344 s
./dataloader_test_log/resnet_8_sp/workerlog.6 - total train time: 56.52781629562378 s
./dataloader_test_log/resnet_8_sp/workerlog.7 - total train time: 56.52778148651123 s
./dataloader_test_log/se_resnet_8_sp/workerlog.0 - total train time: 62.33003783226013 s
./dataloader_test_log/se_resnet_8_sp/workerlog.1 - total train time: 62.401503801345825 s
./dataloader_test_log/se_resnet_8_sp/workerlog.2 - total train time: 62.092822790145874 s
./dataloader_test_log/se_resnet_8_sp/workerlog.3 - total train time: 62.20855903625488 s
./dataloader_test_log/se_resnet_8_sp/workerlog.4 - total train time: 62.29902935028076 s
./dataloader_test_log/se_resnet_8_sp/workerlog.5 - total train time: 62.26711320877075 s
./dataloader_test_log/se_resnet_8_sp/workerlog.6 - total train time: 62.40175724029541 s
./dataloader_test_log/se_resnet_8_sp/workerlog.7 - total train time: 62.07155179977417 s
./dataloader_test_log/transformer_8_sp/workerlog.0 - total train time: 24.19667410850525 s
./dataloader_test_log/transformer_8_sp/workerlog.1 - total train time: 24.166663885116577 s
./dataloader_test_log/transformer_8_sp/workerlog.2 - total train time: 24.18856978416443 s
./dataloader_test_log/transformer_8_sp/workerlog.3 - total train time: 24.231191873550415 s
./dataloader_test_log/transformer_8_sp/workerlog.4 - total train time: 24.256746292114258 s
./dataloader_test_log/transformer_8_sp/workerlog.5 - total train time: 24.22942304611206 s
./dataloader_test_log/transformer_8_sp/workerlog.6 - total train time: 24.18540906906128 s
./dataloader_test_log/transformer_8_sp/workerlog.7 - total train time: 24.51620316505432 s
./dataloader_test_log/mnist_8_mp/workerlog.0 - total train time: 5.971302270889282 s
./dataloader_test_log/mnist_8_mp/workerlog.1 - total train time: 6.012018203735352 s
./dataloader_test_log/mnist_8_mp/workerlog.2 - total train time: 5.971257209777832 s
./dataloader_test_log/mnist_8_mp/workerlog.3 - total train time: 6.194690227508545 s
./dataloader_test_log/mnist_8_mp/workerlog.4 - total train time: 5.955884218215942 s
./dataloader_test_log/mnist_8_mp/workerlog.5 - total train time: 6.020132303237915 s
./dataloader_test_log/mnist_8_mp/workerlog.6 - total train time: 5.928117513656616 s
./dataloader_test_log/mnist_8_mp/workerlog.7 - total train time: 5.965324878692627 s
./dataloader_test_log/resnet_8_mp/workerlog.0 - total train time: 51.89928698539734 s
./dataloader_test_log/resnet_8_mp/workerlog.1 - total train time: 51.88896584510803 s
./dataloader_test_log/resnet_8_mp/workerlog.2 - total train time: 51.88880944252014 s
./dataloader_test_log/resnet_8_mp/workerlog.3 - total train time: 52.01701879501343 s
./dataloader_test_log/resnet_8_mp/workerlog.4 - total train time: 51.87888717651367 s
./dataloader_test_log/resnet_8_mp/workerlog.5 - total train time: 51.88829278945923 s
./dataloader_test_log/resnet_8_mp/workerlog.6 - total train time: 51.988335609436035 s
./dataloader_test_log/resnet_8_mp/workerlog.7 - total train time: 51.889004707336426 s
./dataloader_test_log/se_resnet_8_mp/workerlog.0 - total train time: 54.58965301513672 s
./dataloader_test_log/se_resnet_8_mp/workerlog.1 - total train time: 54.455613136291504 s
./dataloader_test_log/se_resnet_8_mp/workerlog.2 - total train time: 54.44919466972351 s
./dataloader_test_log/se_resnet_8_mp/workerlog.3 - total train time: 54.455557107925415 s
./dataloader_test_log/se_resnet_8_mp/workerlog.4 - total train time: 54.609676122665405 s
./dataloader_test_log/se_resnet_8_mp/workerlog.5 - total train time: 54.4555549621582 s
./dataloader_test_log/se_resnet_8_mp/workerlog.6 - total train time: 54.413665771484375 s
./dataloader_test_log/se_resnet_8_mp/workerlog.7 - total train time: 54.44967770576477 s
./dataloader_test_log/transformer_8_mp/workerlog.0 - total train time: 23.726247310638428 s
./dataloader_test_log/transformer_8_mp/workerlog.1 - total train time: 24.126054048538208 s
./dataloader_test_log/transformer_8_mp/workerlog.2 - total train time: 23.720133304595947 s
./dataloader_test_log/transformer_8_mp/workerlog.3 - total train time: 23.724708795547485 s
./dataloader_test_log/transformer_8_mp/workerlog.4 - total train time: 23.7517249584198 s
./dataloader_test_log/transformer_8_mp/workerlog.5 - total train time: 23.745847463607788 s
./dataloader_test_log/transformer_8_mp/workerlog.6 - total train time: 23.73387098312378 s
./dataloader_test_log/transformer_8_mp/workerlog.7 - total train time: 23.724374055862427 s

chenwhql avatar Jan 16 '20 12:01 chenwhql