Megatron-LM load T5 pretrained model from huggingface or google

load T5 pretrained model from huggingface or google

Open yixuan-qiao opened this issue 3 years ago • 8 comments

We test the pre-training phase of T5 using model parallel and data parallel. It's really a easy framework to use with high GPU utilization. But there are some points that take us a long time when load pre-trained model checkpoints from huggingface or google official a) model structure, such as sentencepiece tokenizer, LN w/o bias, attention w/o bias, etc. b) manually split pretrained checkpoints for 4 model parallel is so painful, is there some better way to do this? we just maintain a list where some of layers are cut by row or column

Sep 01 '21 00:09 yixuan-qiao

I have the same problem and I would like to know how did you solve the problem about position embedding? For T5 models, Huggingface uses relative position but Megatron uses classic embedding method.

Apr 09 '23 11:04 Fizzmy

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

Jul 10 '23 18:07 github-actions[bot]

I would also like to know how to convert a huggingface checkpoint to a Megatron Checkpoint. Probably we need something opposite to this, https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py If someone from Megatron-LM can give me some hints, I can go ahead and try to write the converter scripts. the minimal things to do imo is,

Write a converter from huggingface ckpt -> megatron ckpt
Write some test to make sure at least the forward pass is same for both of the model.

Aug 09 '23 05:08 sbmaruf

Marking as stale. No activity in 60 days.

Oct 08 '23 18:10 github-actions[bot]

Are there any updates about this question?

Jan 25 '24 07:01 exitxingling

And the lm_head layer is also different from huggingface

Jan 26 '24 08:01 exitxingling

I have the same question but haven't found any helpful information.

Feb 04 '24 05:02 wanghao14

Marking as stale. No activity in 60 days.

Apr 04 '24 18:04 github-actions[bot]

Megatron-LM Megatron-LM copied to clipboard

load T5 pretrained model from huggingface or google

Megatron-LM
Megatron-LM copied to clipboard