Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Support for Megatron-VLM training

Open 1049451037 opened this issue 9 months ago • 6 comments

In this pull request, we open source our solution for visual-language model training and inference in pure Megatron style code. In this codebase, we support:

  1. Megatron ViT model, and its model weight converter.
  2. Uneven split of pipeline parallel when the first pipeline has ViT. We find it speed up training with a large margin.
  3. Sequence parallel and context parallel support for VLM training (for both ViT & LM), which is non-trivial when we need to promise the ViT of all ranks receiving gradients. (Since sp and cp split sequence, some of ranks only contains text tokens.)
  4. Detached pp size for ViT and GPT. (Since megatron use a global mpu for all models.)
  5. Multi-modal inference code.

The running example is in examples/llava folder.

Hope that our work can contribute to the open source community. If there are any questions, welcome feedback!

1049451037 avatar May 05 '24 14:05 1049451037

Hi. Thanks for creating this PR. We (NVIDIA) are actually planning to release VLM training functionality in Megatron core in the next couple of weeks. As you may have seen, we've been pushing out some preparatory code to support this. Our initial example release is going to be pretraining and SFT for a llava architecture model using llama3 and clip backbones and a general multimodal webdataset based dataloader. We're reviewing your PR internally to see if we can incorporate any of your work alongside ours and will be sure to credit you as such if we do.

Thanks again!

jon-barker avatar May 07 '24 21:05 jon-barker

Thank you for your attention! Looking forward to the official implementation!

1049451037 avatar May 08 '24 02:05 1049451037

Thank you for your attention! Looking forward to the official implementation!

Hello, i have a question about this PR: how will vit and llm split in PP stage with independent_parallel = True? Thank you!

wangxiang2713 avatar Jun 13 '24 08:06 wangxiang2713

@wangxiang2713 ViT will be in the first stage of LM.

1049451037 avatar Jun 13 '24 08:06 1049451037

Me fale mais suas dúvidas

Em qui, 13 de jun de 2024 05:48, Qingsong Lv @.***> escreveu:

@wangxiang2713 https://github.com/wangxiang2713 ViT will be in the first stage of LM.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/Megatron-LM/pull/806#issuecomment-2165029129, or unsubscribe https://github.com/notifications/unsubscribe-auth/BH4G62IYR7X62XTHLIL7OUTZHFMG3AVCNFSM6AAAAABHHYOBIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRVGAZDSMJSHE . You are receiving this because you commented.Message ID: @.***>

felipeliliti avatar Jun 13 '24 10:06 felipeliliti

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Aug 12 '24 18:08 github-actions[bot]