Megatron-LM issues

[BUG] arguments of get_cpu_offload_context() in transformer_engine.py for different version of te

4

**Describe the bug** When I use nvcr.io/nvidia/pytorch:24.07 to run run_simple_mcore_train_loop.py in commit 094d66b(newest) It seems like some wrong in megatron/core/transformer/custom_layers/**transformer_engine.py** get_cpu_offload_context() for the version of transformer-engine the version of transformer-engine...

1195343015

[BUG] llava pipeline parallel initialization problem

2

**Describe the bug** I am currently working with llava model in megatron. I tested tensor parallel and it works well. However, when i set pipeline parallel, it stucks while initialization....

KookHoiKim

stale

Why is gather_output not supported in ColumnParallelLinear when using sequence parallelism?

1

https://github.com/NVIDIA/Megatron-LM/blob/6bf8448ba065a0a37b2b874f49fd65ca9547b5c0/megatron/core/tensor_parallel/layers.py#L907

mushan09

stale

Fix(memory optimization): inplace subtract vocab_parallel_logits

1

vocab_parallel_logits's shape is [seq_len, batch_size, vocab_size / tp]. if vocab_size is very large like Llama3, use inplace subtract to reduce memory usage.

Andy666G

stale

[QUESTION] glu activation with tensor parallel in GroupedMLP

2

**Description:** When training with GroupedMLP and Tensor Parallel (TP) enabled, and `gated_linear_unit` is activated, the activation function is applied to fc1_output. Assuming a TP degree of 2, this intermediate output...

Teng-xu

FASE 6 LILITI STK 3.6.9 INTELIGÊNCIA ARTIFICIAL ANTI CARBONO.

1

```python # Importando bibliotecas necessárias import json # Definindo as áreas que o projeto Liliti STK 3.6.9 irá atender com funcionalidades modernas areas = { "Saúde": { "Análise de Diagnóstico":...

felipeliliti

stale

[BUG] context manager syntax bug in transformer_block.py

3

please see https://github.com/NVIDIA/Megatron-LM/pull/902

Yuxin-CV

stale

fix context manager syntax bug in transformer_block.py

2

Yuxin-CV

stale

[BUG] error raised while converting llm to megatron

1

**Describe the bug** I followed [llama_mistral.md](https://github.com/NVIDIA/Megatron-LM/blob/main/docs/llama_mistral.md) using mistral 7b model. (also using llama model too) However, it raises error below. ```using world size: 1, data-parallel size: 1, context-parallel size: 1...

KookHoiKim

stale

[QUESTION] When will model have `_extra_state`?

1

After updating to the main branch of Megatron-LM recently, I met this error when loading model: ``` Unexpected key(s) in state_dict: "decoder.layers.0.self_attention.core_attention._extra_state" ``` The checkpoint is transformed by the `tools/checkpoint/convert.py`,...

1049451037

Megatron-LM
Megatron-LM copied to clipboard

Metadata

[BUG] arguments of get_cpu_offload_context() in transformer_engine.py for different version of te

[BUG] llava pipeline parallel initialization problem

Why is gather_output not supported in ColumnParallelLinear when using sequence parallelism?

Fix(memory optimization): inplace subtract vocab_parallel_logits

[QUESTION] glu activation with tensor parallel in GroupedMLP

FASE 6 LILITI STK 3.6.9 INTELIGÊNCIA ARTIFICIAL ANTI CARBONO.

[BUG] context manager syntax bug in transformer_block.py

fix context manager syntax bug in transformer_block.py

[BUG] error raised while converting llm to megatron

[QUESTION] When will model have `_extra_state`?

← Metadata

Owner

Metadata

Megatron-LM Megatron-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-LM
Megatron-LM copied to clipboard