Megatron-LM issues

[BUG] The argument --no-position-embedding should be fixed

3

**Describe the bug** from the file Megatron-LM/megatron/training/arguments.py ``` group.add_argument('--no-position-embedding', action='store_false', help='Disable position embedding. Deprecated: use --position-embedding-type', dest='add_position_embedding') ``` I can see that this argument is Deprecated, but if we only...

Hoonly

stale

[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,

**Describe the bug** If i repeat runing test/unit_tests/dist-checkpoing/test_serialization.py many times, there is a small chance that it will get stuck **To Reproduce** step1: cd tests/unit_tests step2: bash ut.sh the following...

starkhu

[QUESTION] Why torch.cuda.synchronize() is added after the computational CUDA stream waits the P2P communication

4

Quite curious about the following codes. In L94, the CUDA stream used for OP computation would wait the CUDA stream used for P2P communication, and there would be no data...

sneaxiy

[QUESTION] why the time of one iter in nsys longer than that in the ouput log?

1

I want to compare the speed of training llama2-7b between libai(https://github.com/Oneflow-Inc/libai) and Megatron-LM in NVIDIA A800-SXM4-80G. But I find the time of one iter in nsys is longer than the...

hanwen-sun

stale

Does Megatron has plan to support llama pre-train？

3

Does Megatron has plan to support llama pre-train？

wen020

stale

[QUESTION]Zarr-based strategies will not be registered because of missing packages

4

**Your question** Ask a clear and concise question about Megatron-LM. When I transfer the hf_weights to megatron_weights using "tools/checkpoint/util.py --model-type GPT", I got the output informations as follows, and the...

ZhangEnmao

fix torch softmax masking

1

element-wise multiplication with mask after torch softmax

JRD971000

stale

About building .bin and .idx

5

Hi, thank you for your great work! I've been using Megatron-LM for some time, and I've encountered some problems in building a large dataset. I used [preprocess_data.py](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/preprocess_data.py) to build a...

Yijia-Xiao

bug

[BUG] No Module Error

2

**Describe the bug** I am running data preprocessing script using the following command: ``` python tools/preprocess_data.py \ --input ./openwebtext/scraped_100/train_data.json \ --output-prefix ./openwebtext/scraped_100/my_gpt2 \ --vocab-file ./big_models/megatron-gpt-345m/gpt2-vocab.json \ --dataset-impl mmap \ --tokenizer-type...

zhentingqi

stale

[QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?

2

What is the difference between with/without mcore model in pretrain_gpt.py? [pretrain_gpt.py#L33](https://github.com/NVIDIA/Megatron-LM/blob/5f9c870f9f24b482509699d206a9dbb00958f6fc/pretrain_gpt.py#L33)

TING2938

stale

Megatron-LM
Megatron-LM copied to clipboard

Metadata

[BUG] The argument --no-position-embedding should be fixed

[BUG]:there is a small chance that it will get stuck, If i repeat runing test_serialization.py many times,

[QUESTION] Why torch.cuda.synchronize() is added after the computational CUDA stream waits the P2P communication

[QUESTION] why the time of one iter in nsys longer than that in the ouput log?

Does Megatron has plan to support llama pre-train？

[QUESTION]Zarr-based strategies will not be registered because of missing packages

fix torch softmax masking

About building .bin and .idx

[BUG] No Module Error

[QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?

← Metadata

Owner

Metadata

Megatron-LM Megatron-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-LM
Megatron-LM copied to clipboard