DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 323 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

`epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.0253753662109375|cri_loss: 0.2144775390625|unsuper_loss: 0.0 average reward score: 0.20556640625 ------------------------------------------------------------------------------------- epoch: 0|step: 260|ppo_ep: 1|act_loss: 0.1915283203125|cri_loss: 0.326171875|unsuper_loss: 0.0 average reward score: 0.205810546875 ------------------------------------------------------------------------------------- epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.1837158203125|cri_loss: 0.2259521484375|unsuper_loss:...

deespeed chat
modeling

1. add a arg named `add_eot_token` to avoid to add eot token when `args.add_eot_token` is False. (fix step2 llama2) 2. remove some redundant code. like `chosen_token["input_ids"] = chosen_token["input_ids"]` 3. avoid...

``` Hello, I wish you good work. I got stuck at a point here and wanted to get an answer from you. When we first set up the tokenizer structure,...

when I use 4 * A100 80G to run step3 with llama2-7b(actor_model) tiny-llama-1.1B(ref_model),it will used 53848MB memory in generation and in training used 79610MB memory . when I use 8...

| Model | Dataset | Peft | Step | Acc | PS | |:---:|:---:|:---:|:---:|:---:|:---:| | llama2-7b | Dahoas/full-hh-rlhf | LoRA | 2 | 0.64 | \ | | llama2-7b |...

terminate called after throwing an instance of 'c10::Error' what(): CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44...

deespeed chat

### Keep other settings the same, when enabling the hybrid engine, the actor model in Step 3 generates the same token one by one until reaching the max length of...

Hello, good work everyone. Is training these models suitable for this Structure? If not, what should be changed? Or can you add it?

When I use Zero3, in initializing the network, if my Llama is rewritten by inheritance as follows: ``` class FlashLlamaModel(LlamaModel): def __init__(self, config: LlamaConfig): super().__init__(config) class FlashLlamaForCausalLM(LlamaForCausalLM): def __init__(self, config):...

Hello! Did anyone meet the following bug when using zero_stage3 for Lllama2? step3_rlhf_finetuning/rlhf_engine.py:61 in __init__ │ │ │ │ 58 │ │ self.num_total_iters = num_total_iters │ │ 59 │ │...