Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

load T5 pretrained model from huggingface or google

Open yixuan-qiao opened this issue 3 years ago • 8 comments

We test the pre-training phase of T5 using model parallel and data parallel. It's really a easy framework to use with high GPU utilization. But there are some points that take us a long time when load pre-trained model checkpoints from huggingface or google official a) model structure, such as sentencepiece tokenizer, LN w/o bias, attention w/o bias, etc. b) manually split pretrained checkpoints for 4 model parallel is so painful, is there some better way to do this? we just maintain a list where some of layers are cut by row or column

yixuan-qiao avatar Sep 01 '21 00:09 yixuan-qiao

I have the same problem and I would like to know how did you solve the problem about position embedding? For T5 models, Huggingface uses relative position but Megatron uses classic embedding method.

Fizzmy avatar Apr 09 '23 11:04 Fizzmy

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 10 '23 18:07 github-actions[bot]

I would also like to know how to convert a huggingface checkpoint to a Megatron Checkpoint. Probably we need something opposite to this, https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py If someone from Megatron-LM can give me some hints, I can go ahead and try to write the converter scripts. the minimal things to do imo is,

  1. Write a converter from huggingface ckpt -> megatron ckpt
  2. Write some test to make sure at least the forward pass is same for both of the model.

sbmaruf avatar Aug 09 '23 05:08 sbmaruf

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Oct 08 '23 18:10 github-actions[bot]

Are there any updates about this question?

exitxingling avatar Jan 25 '24 07:01 exitxingling

And the lm_head layer is also different from huggingface

exitxingling avatar Jan 26 '24 08:01 exitxingling

I have the same question but haven't found any helpful information.

wanghao14 avatar Feb 04 '24 05:02 wanghao14

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Apr 04 '24 18:04 github-actions[bot]