Megatron-LM issues

[core dataset compilation error]

3

**Describe the bug** When I am using the most recent Megatrone-LM fork I get the following error ``` make: Entering directory '/workspace/megatron-lm/megatron/core/datasets' g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/usr/include/python3.10...

shamanez

stale

Support for Megatron-VLM training

6

In this pull request, we open source our solution for visual-language model training and inference in pure Megatron style code. In this codebase, we support: 1. Megatron ViT model, and...

1049451037

stale

why Megatron's ParallelAttention currently only supports SelfAttention with a 'causal' MaskType?

2

Could you please explain why Megatron's ParallelAttention currently only supports SelfAttention with a 'causal' MaskType? Also, is there potential for flashAttention support in cases where the Mask is 'None' for...

chenfengshijie

[QUESTION] Does Megatron-Core supports LLAMA models?

5

Does Megatron-Core supports LLAMA models?

noob-ctrl

use _all_gather_base instead of all_gather

1

when using allgather, the output is a list, and in the implementation of torch, the list will be flattened and unflattened, which will result in additional allocation of GPU memory...

taozhiwei

stale

Huggingface <-> Megatron-LM Compatibility

25

Looking for a way to convert model weights between huggingface and Megatron-LM. (1): Continual pretraining from pretrained weights from huggingface (2): Convert Megatron-LM model weights to huggingface It shouldn't be...

usuyama

Fused kernel compilation could get stuck

17

Hi, I've noticed that the program could get stuck at "using torch.float16 for parameters ...". I found that the problem was stuck at compilating fused_kernels and deleting megatron/fused_kernel/build seems to...

rhythmswing

bug

stale

[QUESTION] How to pre-build the dataset's index ?

1

How to pre-build the dataset's index ? I want to avoid using compute node for this task: ``` > WARNING: could not find index map files, building the indices on...

etiennemlb

Speed up the creation of attention mask

1

Prefer to use the inplace variant of triu_/tril_ because they are faster than the out-of-place variants since torch 2.3.0 (https://github.com/pytorch/pytorch/pull/115013).

yuantailing

add argument `--rotary-base` for gpt model

1

add argument `--rotary-base` for gpt model

TING2938

stale

Megatron-LM
Megatron-LM copied to clipboard

Metadata

[core dataset compilation error]

Support for Megatron-VLM training

why Megatron's ParallelAttention currently only supports SelfAttention with a 'causal' MaskType?

[QUESTION] Does Megatron-Core supports LLAMA models?

use _all_gather_base instead of all_gather

Huggingface <-> Megatron-LM Compatibility

Fused kernel compilation could get stuck

[QUESTION] How to pre-build the dataset's index ?

Speed up the creation of attention mask

add argument `--rotary-base` for gpt model

← Metadata

Owner

Metadata

Megatron-LM Megatron-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-LM
Megatron-LM copied to clipboard