Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Ongoing research training transformer models at scale

Results 294 Megatron-LM issues
Sort by recently updated
recently updated
newest added

Fix the bug where the optimizer doesn't actually use multi_tensor_applier under float16, because overflow_buf is always False. Specifically, `overflow_buf = self._dummy_overflow_buf`, and `self._dummy_overflow_buf` is initialized as `torch.tensor([0], dtype=torch.int, device='cuda')` under...

stale

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior. The easier it is to reproduce the faster it will...

stale

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...

stale

Fixed the bug that prevents configuring datasets using train-data-path, valid-data-path, and test-data-path. When the --split parameter is not configured, the --split parameter will be set to the default value 969,...

stale

**Describe the bug** When I configure datasets for a training task using train-data-path, valid-data-path, and test-data-path, running the training task results in an error. The error message is shown in...

stale

**Your question** When we want to make a training in LLMs with a lot of corpora, I understand that the usual approach is to introduce the documents with the following...

**Your question** Ask a clear and concise question about Megatron-LM. Is `backward` below supposed to be `forward` ? ![image](https://github.com/user-attachments/assets/cf20f652-2648-4af6-92ac-b718229408f9)

**Your question** Is there a way to start training on a llama2 with a llama3 tokenizer? I plan on doing all the pretraining myself, if so and someone can provide...

**Describe the bug** When the sequence of calculation parameters (FP16/BF16) in the buffer is different from the forward execution sequence of the model: As a result, when the `--overlap-param-gather` command...

stale

**Describe the bug** The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. `xxx\state_dict_tp_x.pt` is not `xxx/iter_0000001/mp_rank_00/model_optim_rng.pt`? **To Reproduce** - **Expected behavior** File format...

stale