Megatron-LM issues

[bugfix]: fixed combine_mistral_clip.sh

# Problem description The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. [bug issue](https://github.com/NVIDIA/Megatron-LM/issues/949) # After fix Under the original configuration, the conversion...

Baibaifan

Distributed Mamba Training

7

**How to customise the train.sh for a distributed Mamba Training ?** Hello, As i've seen in the megatron modules, there isn't a pre-defined bash script to pre-train a mamba model...

SkanderBS2024

How to train multiple binariey files at the same time or merge them?

2

Suppose I have three datasets and convert to binary files **train1.bin, train1.idx, train2.bin, train2.idx, train3.bin, train3.idx.** During training, I want these three data sets to be merged into one and...

Liangyz2019

stale

[DOC] Fix wrong llama2 pretrain url in README

1

The url link in [this section](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#llama-2-inference-and-finetuning) is out-dated and invalid, I think it should be updated

lausannel

stale

add distributed pdb to enable efficient debugging

## Description For researchers and LLM practitioners, we need to debug distributed codes to study how to improve the algorithm. So we need a distributed pdb to help us to...

yiakwy-xpu-ml-framework-team

When will megatron Flash attention 3 be supported？

**Your question** Flash attention 3 has been updated, link: - Blogpost: https://tridao.me/blog/2024/flash3/ - Paper: https://tridao.me/publications/flash3/flash3.pdf Megatron's support for Flash attention 3 to improve training efficiency。FlashAttention-3 is optimized for Hopper GPUs...

echo-valor

[QUESTION] Calculations regarding calculate_per_token_loss parameter

In line 231-233 in megatron/core/pipeline_parallel/schedules.py ([megatron/core/pipeline_parallel/schedules.py](https://github.com/NVIDIA/Megatron-LM/blob/80c7c6e936e6868fe251eabb079ae5d84ee28311/megatron/core/pipeline_parallel/schedules.py#L231)), I have two questions: 1. Why are we dividing by num_tokens when the conditional is "if not config.calculate_per_token_loss" 2. What is the purpose of...

clarence-lee-sheng

[QUESTION] Has standalone_embedding_stage been supported yet in core?

1

I met an issue and want to split the embedding layer out of transformer block to make it alone in single pp stage, but I found that it has not...

JiwenJ

[QUESTION] add_position_embedding=False in checkpoint_args during Llama3 8B training

2

**Your question** The error message during the execution of the llama3 training .sh is as follows: ``` [rank0]: Traceback (most recent call last): [rank0]: File "/workspace/wangws/Megatron-LM/pretrain_gpt.py", line 243, in [rank0]:...

NEU-rzh

[BUG] GPTDataset._build_document_sample_shuffle_indices does not build the indices on non-root nodes when not using NFS

5

**Describe the bug** If the training data does not live on NFS but on node-specific storage, the current logic in https://github.com/NVIDIA/Megatron-LM/blob/0bc3547702464501feefeb5523b7a17e591b21fa/megatron/core/datasets/gpt_dataset.py#L346 skips building the indices and result in an error...

dementrock

Megatron-LM
Megatron-LM copied to clipboard

Metadata

[bugfix]: fixed combine_mistral_clip.sh

Distributed Mamba Training

How to train multiple binariey files at the same time or merge them?

[DOC] Fix wrong llama2 pretrain url in README

add distributed pdb to enable efficient debugging

When will megatron Flash attention 3 be supported？

[QUESTION] Calculations regarding calculate_per_token_loss parameter

[QUESTION] Has standalone_embedding_stage been supported yet in core?

[QUESTION] add_position_embedding=False in checkpoint_args during Llama3 8B training

[BUG] GPTDataset._build_document_sample_shuffle_indices does not build the indices on non-root nodes when not using NFS

← Metadata

Owner

Metadata

Megatron-LM Megatron-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-LM
Megatron-LM copied to clipboard