Results 2 issues of youngrok cha

### Bug description I tried to train huggingface transformers model with deepspeed_stage3, but when I load model with checkpoint like the code below, error occurs. I think checkpoint and model...

bug
docs
3rd party
ver: 2.1.x

I've run a number of experiments and it looks like that most of the performance comes from enabling pos_shift. ``` python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 6.840701103210449 python examples/eval_long_ppl.py --model_name_or_path...