Megatron-LM issues

Fix the bug where the optimizer doesn't actually call multi_tensor_applier under float16.

1

Fix the bug where the optimizer doesn't actually use multi_tensor_applier under float16, because overflow_buf is always False. Specifically, `overflow_buf = self._dummy_overflow_buf`, and `self._dummy_overflow_buf` is initialized as `torch.tensor([0], dtype=torch.int, device='cuda')` under...

Gstdioh

stale

[BUG]

1

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior. The easier it is to reproduce the faster it will...

felipeliliti

stale

Fonte facilitada em fractal 2030

1

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd...

felipeliliti

stale

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path

1

Fixed the bug that prevents configuring datasets using train-data-path, valid-data-path, and test-data-path. When the --split parameter is not configured, the --split parameter will be set to the default value 969,...

Eisenhower

stale

Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors

1

**Describe the bug** When I configure datasets for a training task using train-data-path, valid-data-path, and test-data-path, running the training task results in an error. The error message is shown in...

Eisenhower

stale

[QUESTION] Why is `reset_attention_mask=False` by default?

**Your question** When we want to make a training in LLMs with a lot of corpora, I understand that the usual approach is to introduce the documents with the following...

dtamayo-nlp

[QUESTION] One possible typo in docs/source/distrib_optimizer.md

**Your question** Ask a clear and concise question about Megatron-LM. Is `backward` below supposed to be `forward` ? ![image](https://github.com/user-attachments/assets/cf20f652-2648-4af6-92ac-b718229408f9)

wplf

Differnt Tokenizer

**Your question** Is there a way to start training on a llama2 with a llama3 tokenizer? I plan on doing all the pretraining myself, if so and someone can provide...

dustinwloring1988

[BUG] when use --use-mcore-models and --overlap-param-gather bug

2

**Describe the bug** When the sequence of calculation parameters (FP16/BF16) in the buffer is different from the forward execution sequence of the model: As a result, when the `--overlap-param-gather` command...

Kingsleyandher

stale

[BUG]`examples/multimodal/combine_mistral_clip.sh` Vision model file mismatch.

1

**Describe the bug** The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. `xxx\state_dict_tp_x.pt` is not `xxx/iter_0000001/mp_rank_00/model_optim_rng.pt`? **To Reproduce** - **Expected behavior** File format...

Baibaifan

stale

Megatron-LM
Megatron-LM copied to clipboard

Metadata

Fix the bug where the optimizer doesn't actually call multi_tensor_applier under float16.

[BUG]

Fonte facilitada em fractal 2030

Fix Bug: Configuring Datasets with train-data-path, valid-data-path, test-data-path

Configuring datasets using train-data-path, valid-data-path, and test-data-path results in training errors

[QUESTION] Why is `reset_attention_mask=False` by default?

[QUESTION] One possible typo in docs/source/distrib_optimizer.md

Differnt Tokenizer

[BUG] when use --use-mcore-models and --overlap-param-gather bug

[BUG]`examples/multimodal/combine_mistral_clip.sh` Vision model file mismatch.

← Metadata

Owner

Metadata

Megatron-LM Megatron-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

Megatron-LM
Megatron-LM copied to clipboard