UER-py 多卡运行卡在transformer部分

trafficstars

运行run_classifier.py，默认使用所有显卡，使用nvidia-smi查看Volatile GPU-Util四块Tesla T4显卡均是100%，但是代码卡在transformer部分，指定单张卡速度却恢复正常！

Oct 14 '20 01:10 SunshlnW

您说的卡在transformer，是指使用多卡以后，模型反而变慢是么？如果模型不大，并且batchsize不大，多卡会比单卡慢如果模型比较大，并且batchsize开很大，多GPU的优势会很明显

Oct 14 '20 08:10 zhezhaoa

您说的卡在transformer，是指使用多卡以后，模型反而变慢是么？如果模型不大，并且batchsize不大，多卡会比单卡慢堕入模型比较大，并且batchsize开很大，多GPU的优势会很明显

Word_based那个模型，batch_size是64，seq_length是256

Oct 14 '20 08:10 SunshlnW

下游任务batchsize设置为64，如果使用4个GPU的话，每个GPU的分到的batchsize为16。可以把batchsize开的大一点可以简单描述一下速度么？比如多GPU相对于单GPU的速度

Oct 14 '20 09:10 zhezhaoa

下游任务batchsize设置为64，如果使用4个GPU的话，每个GPU的分到的batchsize为16。可以把batchsize开的大一点可以简单描述一下速度么？比如多GPU相对于单GPU的速度

可以了，确实跟batch_size有关，batch_size小显卡多的情况确实速度会变慢，谢谢！

Oct 15 '20 02:10 SunshlnW

下游任务batchsize设置为64，如果使用4个GPU的话，每个GPU的分到的batchsize为16。可以把batchsize开的大一点可以简单描述一下速度么？比如多GPU相对于单GPU的速度

还是我，我尝试使用pretrain.py进行自有语料增量训练，使用2张显卡，batch_size已经设到256，trainer.py里的这个print一直都没有输出，是不是并行部分需要特殊的设置

   if args.dist_train:
        # Initialize multiprocessing distributed training environment.
        dist.init_process_group(backend=args.backend,
                                init_method=args.master_ip,
                                world_size=args.world_size,
                                rank=rank)
        model = DistributedDataParallel(model, device_ids=[gpu_id])
        print("Worker %d is training ... " % rank)

Oct 16 '20 07:10 SunshlnW

pretrain.py阶段的真实的batchsize等于 batch_size * world_size 这点和finetune阶段不一样如果batchsize设置为256，显存应该不够用我猜测是您的preprocess和pretrain的命令写错了，可以再照着quickstart和instructions检查一下或者把preprocess和pretrain命令发一下

Oct 16 '20 08:10 zhezhaoa

pretrain.py阶段的真实的batchsize等于 batch_size * world_size 这点和finetune阶段不一样如果batchsize设置为256，显存应该不够用我猜测是您的preprocess和pretrain的命令写错了，可以再照着quickstart和instructions检查一下或者把preprocess和pretrain命令发一下

process命令是

python3 preprocess.py --corpus_path my_data.txt --vocab_path models/my_word_vocab.txt --dataset_path my_word_dataset.pt --processes_num 4 --target bert --tokenizer space --dynamic_masking --seq_length 256

pretrain命令是

python3 pretrain.py --dataset_path my_word_dataset.pt --vocab_path models/my_word_vocab.txt --pretrained_model_path models/my_bert_word_model.bin --output_model_path models/my_bert_word_incremental_model.bin --world_size 4 --gpu_ranks 0 1 2 3 --total_steps 200000 --save_checkpoint_steps 50000 --report_steps 1 --encoder bert --target bert --batch_size 64

Oct 16 '20 11:10 SunshlnW

您的语料是BERT格式的么？直接邮件联系吧 [email protected]

Oct 17 '20 00:10 Embedding

UER-py UER-py copied to clipboard

多卡运行卡在transformer部分

UER-py
UER-py copied to clipboard