Results 14 issues of Heyang Sun

## Description This is an example of TRL reward modeling, a kind of RLHF, where uses dataset Anthropic/hh-rlhf that on our whitelist. ### 1. Why the change? Enable RM on...

## Description Use Deepspeed Zero3 to split and distribute layers of a large model to multiple XPUs and executes QLoRA fine-tuning. ### 1. Why the change? as above ### 2....

Tried to load and distribute model to devices in a layerwise way, by using deepspeed zero3 context manager as below: ```python with ds.zero.Init(config_dict_or_path=deepspeed): model = AutoModelForCausalLM.from_pretrained( base_model, config=model_config, torch_dtype=torch.bfloat16, ignore_mismatched_sizes=True,...

## Description ### 1. Why the change? #11167 ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A - [...