Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Ongoing research training transformer models at scale

Results 294 Megatron-LM issues
Sort by recently updated
recently updated
newest added

**Describe the bug** When I am using the most recent Megatrone-LM fork I get the following error ``` make: Entering directory '/workspace/megatron-lm/megatron/core/datasets' g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/usr/include/python3.10...

stale

In this pull request, we open source our solution for visual-language model training and inference in pure Megatron style code. In this codebase, we support: 1. Megatron ViT model, and...

stale

Could you please explain why Megatron's ParallelAttention currently only supports SelfAttention with a 'causal' MaskType? Also, is there potential for flashAttention support in cases where the Mask is 'None' for...

when using allgather, the output is a list, and in the implementation of torch, the list will be flattened and unflattened, which will result in additional allocation of GPU memory...

stale

Looking for a way to convert model weights between huggingface and Megatron-LM. (1): Continual pretraining from pretrained weights from huggingface (2): Convert Megatron-LM model weights to huggingface It shouldn't be...

Hi, I've noticed that the program could get stuck at "using torch.float16 for parameters ...". I found that the problem was stuck at compilating fused_kernels and deleting megatron/fused_kernel/build seems to...

bug
stale

How to pre-build the dataset's index ? I want to avoid using compute node for this task: ``` > WARNING: could not find index map files, building the indices on...

Prefer to use the inplace variant of triu_/tril_ because they are faster than the out-of-place variants since torch 2.3.0 (https://github.com/pytorch/pytorch/pull/115013).

add argument `--rotary-base` for gpt model

stale