Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 128 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

(gh_Megatron-DeepSpeed_yk) ub2004@ub2004-B85M-A0:~/nndev/Megatron-DeepSpeed_yk$ python3 Python 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> (gh_Megatron-DeepSpeed_yk) ub2004@ub2004-B85M-A0:~/nndev/Megatron-DeepSpeed_yk$ pip...

Hi,i have used deepspeed framework to train gpt-117M model. when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py...

This adds pretraining using [UL2](https://arxiv.org/abs/2205.05131) for both encoder-decoder, non-causal decoder-only, and causal decoder-only models. I have not yet run large-scale tests to see if it yields the desired training improvements,...

hello, when I run script to train gpt model,I meet an assertion error:Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed. the script...

``` def _merge_zero_shards(param_base_path, state, tp_degree, slice_shape): slices = [] for tp_index in range(tp_degree): prefix_path = os.path.join(param_base_path, str(tp_index), f"{state}") paths = sorted(list(glob.glob(f"{prefix_path}.0*"))) #print(paths) shards = [torch.load(p) for p in paths] slice...

See https://arxiv.org/abs/2212.10554.

1. I train a gpt2 model with pipeline parallerlism, Flops Profiler in ds config is useless, it output nothing 2. So add some code like this: ``` prof = None...

used the following installation method, but received an error that has not been resolved for several days: git clone https://github.com/NVIDIA/apex cd apex pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check ....

Is Megatron-DeepSpeed only targeting specific models such as GPT-2? Can it support parallel partitioning of relatively lightweight models such as CLIP?

Hi I'm trying to continue pre-training the bloom-560m on my own dataset on a single GPU. I modified [this script](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/examples/pretrain_gpt_single_node.sh) to fit my case. However, i cannot figure out how...