DeepSpeedExamples issues

During the training of Step 3, the reward score of my language model collapsed to a stable point

7

scarydemon2

deespeed chat

modeling

fix: don't add eot token if add_eot_token knob is False

1. add a arg named `add_eot_token` to avoid to add eot token when `args.add_eot_token` is False. (fix step2 llama2) 2. remove some redundant code. like `chosen_token["input_ids"] = chosen_token["input_ids"]` 3. avoid...

EeyoreLee

[PROBLEM] DeepSpeedChat Create HF Model FOR LLAMA Token ID Question

1

``` Hello, I wish you good work. I got stuck at a point here and wanted to get an answer from you. When we first set up the tokenizer structure,...

syngokhan

step3 use same memory when I increase GPUs

1

when I use 4 * A100 80G to run step3 with llama2-7b(actor_model) tiny-llama-1.1B(ref_model)，it will used 53848MB memory in generation and in training used 79610MB memory . when I use 8...

Little-rookie-ee

[Discussion] Can anyone show the performance on every step with any dataset

| Model | Dataset | Peft | Step | Acc | PS | |:---:|:---:|:---:|:---:|:---:|:---:| | llama2-7b | Dahoas/full-hh-rlhf | LoRA | 2 | 0.64 | \ | | llama2-7b |...

EeyoreLee

what(): CUDA error: an illegal memory access was encountered

8

terminate called after throwing an instance of 'c10::Error' what(): CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44...

zerlinkcn

deespeed chat

[BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled

9

### Keep other settings the same, when enabling the hybrid engine, the actor model in Step 3 generates the same token one by one until reaching the max length of...

GeekDream-x

Mistral and Orca Training

Hello, good work everyone. Is training these models suitable for this Structure? If not, what should be changed? Or can you add it?

syngokhan

Same model Llama 7B, why does zero3 initialize different parameter sizes？

1

When I use Zero3, in initializing the network, if my Llama is rewritten by inheritance as follows: ``` class FlashLlamaModel(LlamaModel): def __init__(self, config: LlamaConfig): super().__init__(config) class FlashLlamaForCausalLM(LlamaForCausalLM): def __init__(self, config):...

BaenRH

Llama2 as actor using zero_stage3

1

Hello! Did anyone meet the following bug when using zero_stage3 for Lllama2? step3_rlhf_finetuning/rlhf_engine.py:61 in __init__ │ │ │ │ 58 │ │ self.num_total_iters = num_total_iters │ │ 59 │ │...

George-Chia

DeepSpeedExamples
DeepSpeedExamples copied to clipboard

Metadata

During the training of Step 3, the reward score of my language model collapsed to a stable point

fix: don't add eot token if add_eot_token knob is False

[PROBLEM] DeepSpeedChat Create HF Model FOR LLAMA Token ID Question

step3 use same memory when I increase GPUs

[Discussion] Can anyone show the performance on every step with any dataset

what(): CUDA error: an illegal memory access was encountered

[BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled

Mistral and Orca Training

Same model Llama 7B, why does zero3 initialize different parameter sizes？

Llama2 as actor using zero_stage3

← Metadata

Owner

Metadata

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeepSpeedExamples
DeepSpeedExamples copied to clipboard