DeepSpeedExamples The reward in step3 seems to be completely random without any noticeable increase.

I am testing the 1.3B training. Steps 1 and 2 have already passed, but there is no change in reward after completing step 3.

I used LoRa to train for one iteration, and the results of steps 1 and 2 are as follows: step1: ppl: 2.18959641456604

step2：

Step3：

I let chatgpt extracting the logs for step 3 and comparing them with the demo logs provided in the project. I found that the absolute value of my loss is significantly smaller, and the reward seems to be completely random without any noticeable increase. (stand)

May 07 '23 15:05 laoda513

My rewards seems even decrasing, despite the decrease in loss W B Chart 07_05_2023, 17_15_56 W B Chart 07_05_2023, 17_15_50 W B Chart 07_05_2023, 17_15_13

May 07 '23 23:05 puyuanOT

@puyuanOT OK i got the solution. Try to disable the hybirdengine, this make the model always repeat 'a a a a a' not sure the reason.

May 08 '23 15:05 laoda513

@puyuanOT OK i got the solution. Try to disable the hybirdengine, this make the model always repeat 'a a a a a' not sure the reason.

Thanks a lot! Will try it out.

May 08 '23 17:05 puyuanOT

Perhaps it's related to this PR https://github.com/microsoft/DeepSpeedExamples/pull/470?

May 08 '23 17:05 puyuanOT

that's another bug I think.

May 09 '23 02:05 laoda513

@puyuanOT OK i got the solution. Try to disable the hybirdengine, this make the model always repeat 'a a a a a' not sure the reason.

I also meet this problem and have no idea why this is happening...

May 09 '23 03:05 REIGN12

I open a new issuse to track this #503

May 09 '23 04:05 laoda513

Thank you for letting us know. We are now investigating if HE has any unexpected behavior

May 19 '23 15:05 yaozhewei

Thank you for letting us know. We are now investigating if HE has any unexpected behavior

@yaozhewei I also encountered the same issue at deepspeed==0.9.0 and deepspeed==0.9.1. It can be reproduced by a very simple script. Wish this can help you :) If there is any progress, could you let me know, please

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import trace
import deepspeed
tracer = trace.Trace(count=True, trace=True)
model = AutoModelForCausalLM.from_pretrained("facebook/opt-6.7b")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-6.7b", fast_tokenizer=False,use_fast=False)
tokenizer.padding_side = 'left'
ds_config ={ 'train_micro_batch_size_per_gpu': 4, 'steps_per_print': 10, 'zero_optimization': {'stage': 3, 'offload_param': {'device': 'none'}, 'offload_optimizer': {'device': 'none'}, 'stage3_param_persistence_threshold': 10000.0, 'stage3_max_live_parameters': 30000000.0, 'stage3_prefetch_bucket_size': 30000000.0, 'memory_efficient_linear': False}, 'fp16': {'enabled': True, 'loss_scale_window': 100}, 'gradient_clipping': 1.0, 'prescale_gradients': False, 'wall_clock_breakdown': False,
            'hybrid_engine': {'enabled': True, 'inference_tp_size': 8, 'release_inference_cache': False, 'pin_parameters': True, 'tp_gather_partition_size': 8}}
engine, *_ = deepspeed.initialize(model=model, config=ds_config)
engine.eval()
sent = ["Human: List five action models\n\nAssistant: ", "Human: hello\n\nAssistant: "]
inputs = tokenizer(sent, padding=True, return_tensors='pt')
inputs = inputs.to(model.device)
gen_kwargs = {"max_length": `512}`
output = engine.module.generate(inputs["input_ids"], **gen_kwargs)
torch.cuda.synchronize()
for o in output:
    response = tokenizer.decode(o)
    print (response)

This script uses the opt-6.7b model to make predictions. When I turn off HE or turn on HE with an inference_tp_size of 1, the results match my expectations. However, if I turn on HE with an inference_tp_size greater than 1 (such as 2 or 8), the predicted result is ((((, as shown in the figure below.

This is the testing environment I used.

transformers==4.30.0.dev0 deepspeed==0.9.0

Jun 01 '23 02:06 beichengus

@yaozhewei Same error for training Llama, step1 and step 2 are normal, but step 3 just won't converge

Nov 20 '23 00:11 AlisonWen

DeepSpeedExamples DeepSpeedExamples copied to clipboard

The reward in step3 seems to be completely random without any noticeable increase.

DeepSpeedExamples
DeepSpeedExamples copied to clipboard