DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

BERT in Megatron-LM-v1.1.5-3D_parallelism does not support pipeline parallelism

Open eddy16112 opened this issue 4 years ago • 1 comments

I try to run the BERT with pipeline parallelism, but I get an error:

File "DeepSpeedExamples/Megatron-LM-v1.1.5-3D_parallelism/pretrain_bert.py", line 146, in args_defaults={'tokenizer_type': 'BertWordPieceLowerCase'}) File "/DeepSpeedExamples/Megatron-LM-v1.1.5-3D_parallelism/megatron/training.py", line 81, in pretrain model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/DeepSpeedExamples/Megatron-LM-v1.1.5-3D_parallelism/megatron/training.py", line 252, in setup_model_and_optimizer model.set_batch_fn(model.module._megatron_batch_fn) File "/home/wwu/anaconda3/envs/sospx86/lib/python3.6/site-packages/torch/nn/modules/module.py", line 948, in getattr type(self).name, name)) AttributeError: 'DeepSpeedEngine' object has no attribute 'set_batch_fn'

I dig into the code a little bit, it seems like the pipeline parallelism is not implemented for BERT.

eddy16112 avatar Apr 22 '21 00:04 eddy16112

Hi @eddy16112 , thanks for your interest in 3D parallelism! At this time we have not adapted BERT to support pipeline parallelism. Only the GPT code path is supported.

ShadenSmith avatar May 14 '21 15:05 ShadenSmith