Harryis Wang

Results 7 issues of Harryis Wang

Hi! I encounter an issue that when doing the Step3(SFT). The function "get_accelerate_model" in qlora_model.py sets the adapter_name="lora_default". This results in an error that the trainable parameters are set to...

Hi! I encounter a bug when doing the step3 (Principle Engraving). I used the self_align_merged.json which is created with "self_align_32shards_*.jsonl" and "vicuna_dummy_data.json" to finetune the base model. However, I find...

Hello,Thanks for your awesome work and code! However, I encountered some confusion while trying to understand how you generated TGRT Self Instruction. You mentioned in the article that you first...

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-Alignment/safe-rlhf/discussions) that this hasn't already been reported. (+1 or comment...

question

Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting: ``` if model_args.use_peft is True: ref_model = None...

您好,我在使用remote rm的时候碰到了以下问题: 1. 启动remote rm ``` set -x python -m openrlhf.cli.serve_rm \ --reward_pretrain /mnt/GeneralModel/share/OpenRLHF/Llama-3-8b-rm-700k \ --port 5000 \ --bf16 \ --attn_implementation flash_attention_2 \ --normalize_reward \ --max_len 8192 \ --batch_size 16...

Hello! Thanks for your awesome work! I notice that, in the training process, the model is finetuned to directly predict the 66 unsafe sentences without corresponding questions/instructions. However, in the...