Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Ongoing research training transformer models at scale

Results 294 Megatron-LM issues
Sort by recently updated
recently updated
newest added

## Why? The loader import errors is swallowed if the root cause is not loader_X.py not found. For example, if i don't have `transformers` installed, it also printed loader_X not...

stale

`--workers` is required as in[ preprocess_data.py](https://github.com/NVIDIA/Megatron-LM/blob/0052bf0de70b266d8648e2655da16f7eb2c9ca56/tools/preprocess_data.py#L223), but it is missing in readme.

stale

I am trying to convert the weight for `vicuna-7b-v1.5 `in huggingface transformers ( https://huggingface.co/lmsys/vicuna-7b-v1.5 ) to be used with megatron-lm. I am using `tools/checkpoint/convert.py` to do the conversion. The command...

**Describe the bug** The usage and description of loss-scale is inconsistent. The argument of loss-scale expect to get a number of positive power of 2 but ConstantGradScaler set loss-scale to...

In certain virtualized environment there is no shared storage. Both source code and data are stored (replicated) in each worker node's local storage. The code sections below only load data...

**Describe the bug** ```python def broadcast_params(self): """ Syncs parameters across all DP ranks. """ for param in self.module.parameters(): is_expert_parallel = not getattr(param, 'allreduce', True) if is_expert_parallel: torch.distributed.broadcast( param.data, src=torch.distributed.get_process_group_ranks(self.expert_data_parallel_group), group=self.expert_data_parallel_group,...

forward_backward_pipelining_with_interleaving has a branch of opening config.overlap_p2p_comm, why does forward_backward_pipelining_without_interleaving not have?

stale

**Describe the bug** When I try to run single GPU T5 Pretraining with the script `examples/pretrain_t5.sh`, it outputs the following error: > ModuleNotFoundError: No module named 'scaled_softmax_cuda' It seems that...