DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Example models using DeepSpeed
Hi there, The tutorial https://www.deepspeed.ai/tutorials/bert-finetuning/#loading-huggingface-and-tensorflow-pretrained-models makes clear how to load HF and TF checkpoints into Deepspeed. What if we want to load a Deepspeed checkpoint, like from the Bing BERT...
Error encountered running DeepSpeedExamples/BingBertSquad/run_squad_deepspeed.sh ```console $ ./run_squad_deepspeed.sh 16 ~/models/bert-base-uncased/pytorch_model.bin ~/datasets/squad_data ~/output 11/23/2021 15:42:24 - INFO - __main__ - Loading Pretrained Bert Encoder from: /home/bduser/models/bert-base-uncas/pytorch_model.bin VOCAB SIZE: 30528 Traceback (most recent...
Error occurred running bing_bert/ds_train_bert_nvidia_data_bsz64k_seq128.sh >Detected CUDA files, patching ldflags Emitting ninja build file /home/bduser/.cache/torch_extensions/py38_cu114/fused_lamb/build.ninja... Building extension module fused_lamb... Allowing ninja to set a default number of workers... (overridable by setting...
Follow the bing_bert tutorial, my deepspeed_config is: ```json { "train_batch_size": 4096, "train_micro_batch_size_per_gpu": 32, "steps_per_print": 1000, "prescale_gradients": false, "optimizer": { "type": "Adam", "params": { "lr": 6e-3, "betas": [ 0.9, 0.99 ],...
Collecting the datasets needed for pretraining is a bit of work, especially when downloading from lots of different URLs behind a firewall. https://github.com/microsoft/DeepSpeedExamples/tree/25d73cf73fb3dc66faefa141b7319526555be9fc/Megatron-LM-v1.1.5-ZeRO3#datasets I see that some version of these...
I am trying deepspeed inference with gtpneo-1.3B model. I am using the example [here](https://www.deepspeed.ai/tutorials/inference-tutorial/#end-to-end-gpt-neo-27b-inference) for reference. ``` # Filename: example.py import os import deepspeed import datetime import torch from transformers...
Machine Translation usually takes dynamically sized batch composed of X tokens instead of X sentences as training input. I'm wondering why deepspeed requires specifying `train_batch_size` and `train_micro_batch_size_per_gpu`, both of which...
Bing BERT
Hi guys, I have been trying to run the Bing experiment but it seems I can't for now. ``` "datasets": { -- | "wiki_pretrain_dataset": "/data/bert/bnorick_format/128/wiki_pretrain", | "bc_pretrain_dataset": "/data/bert/bnorick_format/128/bookcorpus_pretrain" | },...
I am trying to follow the example here https://www.deepspeed.ai/tutorials/bert-pretraining/ The section on getting the datasets says 'Note: Downloading and pre-processing instructions are coming soon.'. I tried googling but those datasets...
To fix the issue reported in this issue: https://github.com/microsoft/DeepSpeed/issues/1243