EasyParallelLibrary
EasyParallelLibrary copied to clipboard
epl单机单卡和单机多卡训练step如何理解
单机单卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0 bash ./scripts/train_dp.sh
单机双卡:
启动命令:TF_CONFIG='{"cluster":{"worker":["127.0.0.1:49119"]},"task":{"type":"worker","index":0}}' CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/train_dp.sh
代码修改了一下:去掉了last_step限制,数据集repeat=10,将txt改为py,可执行。 resnet_dp.txt
想请教下,这个如何理解呢?每个卡分别跑了10step?
现在配置的batch_size是batch_size/GPU,global_batch_size = batch_size*gpu_num 数据量不变,增大gpu个数,一个epoch跑的step会线性减少。