DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 274 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

************************[start] Initializing Reward Model [start] ************************ [2023-11-29 14:57:02,054] [INFO] [partition_parameters.py:347:__exit__] finished initializing model - num_params = 1306, num_elems = 39.25B >Creating model from_config took 0.365234375 seconds >Creating model from_config took...

Question: In the SFT training phase in dschat, I found that function `create_dataset_split` in data_utils.py will pad the samples to the maximum length. Therefore, why not dynamically padding to the...

1. **model**: llama-2-7b-hf 2. **execute command**: python rw_eval.py --model_name_or_path /data/llama-2-hf/llama-2-7b-hf/ 3. **GPU**: A6000(48G) 4. **result** - **first result** ![image](https://github.com/microsoft/DeepSpeedExamples/assets/18341845/4bb006cf-7999-4870-99e0-ca39f8420042) - **second result** ![image](https://github.com/microsoft/DeepSpeedExamples/assets/18341845/5f3d7cf8-1627-4e78-9f73-5a65ac86831c) 5. Question: Why does the rw_eval.py script...

Hi, I successfully ran the ['cifar10_deepspeed.py' ](https://www.deepspeed.ai/tutorials/cifar-10/)example on a single node (2xNVIDIA 3090). Now I want to run the same program on multi-nodes (2 nodes each have 2 3090s.). I...

Enable overlap of backward computation and gradient all-reduce. This produces 1.05x end-to-end speedup in SFT training with my settings. See also https://github.com/microsoft/DeepSpeed/pull/4887.

hi: After training the rlhf model(actor: pythia-6.9b, reward model:pythia-410M),i evaluate the saved checkpoint by https://github.com/EleutherAI/lm-evaluation-harness,however, it seems that some weights are missing, here is the log: Some weights of GPTNeoXForCausalLM...

Hi team, I want to use async_pipeline and find mii.async_pipeline is not exposed out with the deepspeed-mii package. Can you add this in the code https://github.com/microsoft/DeepSpeed-MII/blob/main/mii/__init__.py#L6 ? Thanks

Currently, DeepSpeed-Chat directly saves tokenized tensors on disk, which consumes hundreds GB of memory. For each string, it will be converted to **max_seq_len of attention_mask and input_ids**, stored in int32...

My GPU machines do not have openmpi and some launcher. I want to use the original torch.distributed to train with multi-nodes. But the error is always like this: ``` [2023-04-27...

deespeed chat
system

Hi, when I use zero3 to train model, but occurs ```Invalidate trace cache @ step 0: expected module 0, but got module 6```, anyone who knows the reason.