Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Ongoing research training transformer models at scale

Results 485 Megatron-LM issues
Sort by recently updated
recently updated
newest added

**Describe the bug** from the file Megatron-LM/megatron/training/arguments.py ``` group.add_argument('--no-position-embedding', action='store_false', help='Disable position embedding. Deprecated: use --position-embedding-type', dest='add_position_embedding') ``` I can see that this argument is Deprecated, but if we only...

stale

**Describe the bug** If i repeat runing test/unit_tests/dist-checkpoing/test_serialization.py many times, there is a small chance that it will get stuck **To Reproduce** step1: cd tests/unit_tests step2: bash ut.sh the following...

Quite curious about the following codes. In L94, the CUDA stream used for OP computation would wait the CUDA stream used for P2P communication, and there would be no data...

I want to compare the speed of training llama2-7b between libai(https://github.com/Oneflow-Inc/libai) and Megatron-LM in NVIDIA A800-SXM4-80G. But I find the time of one iter in nsys is longer than the...

stale

Does Megatron has plan to support llama pre-train?

stale

**Your question** Ask a clear and concise question about Megatron-LM. When I transfer the hf_weights to megatron_weights using "tools/checkpoint/util.py --model-type GPT", I got the output informations as follows, and the...

element-wise multiplication with mask after torch softmax

stale

Hi, thank you for your great work! I've been using Megatron-LM for some time, and I've encountered some problems in building a large dataset. I used [preprocess_data.py](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/preprocess_data.py) to build a...

bug

**Describe the bug** I am running data preprocessing script using the following command: ``` python tools/preprocess_data.py \ --input ./openwebtext/scraped_100/train_data.json \ --output-prefix ./openwebtext/scraped_100/my_gpt2 \ --vocab-file ./big_models/megatron-gpt-345m/gpt2-vocab.json \ --dataset-impl mmap \ --tokenizer-type...

stale

What is the difference between with/without mcore model in pretrain_gpt.py? [pretrain_gpt.py#L33](https://github.com/NVIDIA/Megatron-LM/blob/5f9c870f9f24b482509699d206a9dbb00958f6fc/pretrain_gpt.py#L33)

stale