Harryis Wang issues

Results 7 issues of


                                            Harryis Wang

adapter_name problem

Hi! I encounter an issue that when doing the Step3(SFT). The function "get_accelerate_model" in qlora_model.py sets the adapter_name="lora_default". This results in an error that the trainable parameters are set to...

About vicuna_dummy_data.json lack 'example_id'

Hi! I encounter a bug when doing the step3 (Principle Engraving). I used the self_align_merged.json which is created with "self_align_32shards_*.jsonl" and "vicuna_dummy_data.json" to finetune the base model. However, I find...

About the way to generate 99,121 synthetic prompts from TGRT Self-Instruct

Hello,Thanks for your awesome work and code！ However, I encountered some confusion while trying to understand how you generated TGRT Self Instruction. You mentioned in the article that you first...

[Question] What's the upper and lower bound of your open source reward model final score?

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-Alignment/safe-rlhf/discussions) that this hasn't already been reported. (+1 or comment...

question

Does QLora DPO Training support reference model?

Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting: ``` if model_args.use_peft is True: ref_model = None...

使用Remote RM 会卡住

您好，我在使用remote rm的时候碰到了以下问题： 1. 启动remote rm ``` set -x python -m openrlhf.cli.serve_rm \ --reward_pretrain /mnt/GeneralModel/share/OpenRLHF/Llama-3-8b-rm-700k \ --port 5000 \ --bf16 \ --attn_implementation flash_attention_2 \ --normalize_reward \ --max_len 8192 \ --batch_size 16...

A question on the training pipeline

Hello! Thanks for your awesome work! I notice that, in the training process, the model is finetuned to directly predict the 66 unsafe sentences without corresponding questions/instructions. However, in the...