Megatron-LM
Megatron-LM copied to clipboard
Ongoing research training transformer models at scale
# Problem description The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. [bug issue](https://github.com/NVIDIA/Megatron-LM/issues/949) # After fix Under the original configuration, the conversion...
**How to customise the train.sh for a distributed Mamba Training ?** Hello, As i've seen in the megatron modules, there isn't a pre-defined bash script to pre-train a mamba model...
Suppose I have three datasets and convert to binary files **train1.bin, train1.idx, train2.bin, train2.idx, train3.bin, train3.idx.** During training, I want these three data sets to be merged into one and...
The url link in [this section](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#llama-2-inference-and-finetuning) is out-dated and invalid, I think it should be updated
## Description For researchers and LLM practitioners, we need to debug distributed codes to study how to improve the algorithm. So we need a distributed pdb to help us to...
**Your question** Flash attention 3 has been updated, link: - Blogpost: https://tridao.me/blog/2024/flash3/ - Paper: https://tridao.me/publications/flash3/flash3.pdf Megatron's support for Flash attention 3 to improve training efficiency。FlashAttention-3 is optimized for Hopper GPUs...
In line 231-233 in megatron/core/pipeline_parallel/schedules.py ([megatron/core/pipeline_parallel/schedules.py](https://github.com/NVIDIA/Megatron-LM/blob/80c7c6e936e6868fe251eabb079ae5d84ee28311/megatron/core/pipeline_parallel/schedules.py#L231)), I have two questions: 1. Why are we dividing by num_tokens when the conditional is "if not config.calculate_per_token_loss" 2. What is the purpose of...
I met an issue and want to split the embedding layer out of transformer block to make it alone in single pp stage, but I found that it has not...
**Your question** The error message during the execution of the llama3 training .sh is as follows: ``` [rank0]: Traceback (most recent call last): [rank0]: File "/workspace/wangws/Megatron-LM/pretrain_gpt.py", line 243, in [rank0]:...
**Describe the bug** If the training data does not live on NFS but on node-specific storage, the current logic in https://github.com/NVIDIA/Megatron-LM/blob/0bc3547702464501feefeb5523b7a17e591b21fa/megatron/core/datasets/gpt_dataset.py#L346 skips building the indices and result in an error...