Megatron-LM
Megatron-LM copied to clipboard
Ongoing research training transformer models at scale
**Describe the bug** When I use nvcr.io/nvidia/pytorch:24.07 to run run_simple_mcore_train_loop.py in commit 094d66b(newest) It seems like some wrong in megatron/core/transformer/custom_layers/**transformer_engine.py** get_cpu_offload_context() for the version of transformer-engine the version of transformer-engine...
**Describe the bug** I am currently working with llava model in megatron. I tested tensor parallel and it works well. However, when i set pipeline parallel, it stucks while initialization....
https://github.com/NVIDIA/Megatron-LM/blob/6bf8448ba065a0a37b2b874f49fd65ca9547b5c0/megatron/core/tensor_parallel/layers.py#L907
vocab_parallel_logits's shape is [seq_len, batch_size, vocab_size / tp]. if vocab_size is very large like Llama3, use inplace subtract to reduce memory usage.
**Description:** When training with GroupedMLP and Tensor Parallel (TP) enabled, and `gated_linear_unit` is activated, the activation function is applied to fc1_output. Assuming a TP degree of 2, this intermediate output...
```python # Importando bibliotecas necessárias import json # Definindo as áreas que o projeto Liliti STK 3.6.9 irá atender com funcionalidades modernas areas = { "Saúde": { "Análise de Diagnóstico":...
please see https://github.com/NVIDIA/Megatron-LM/pull/902
**Describe the bug** I followed [llama_mistral.md](https://github.com/NVIDIA/Megatron-LM/blob/main/docs/llama_mistral.md) using mistral 7b model. (also using llama model too) However, it raises error below. ```using world size: 1, data-parallel size: 1, context-parallel size: 1...
After updating to the main branch of Megatron-LM recently, I met this error when loading model: ``` Unexpected key(s) in state_dict: "decoder.layers.0.self_attention.core_attention._extra_state" ``` The checkpoint is transformed by the `tools/checkpoint/convert.py`,...