alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
Hello, I observe that the models I trained and the official models provided by [HuggingFace](https://huggingface.co/alignment-handbook/) do not match the results of [Zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on TruthfulQA. I used lm_evaluate_harness for evaluation, and...
Hello, I have been using the Zephry DPO recipe and the models I get are save in float32. I am using config_full and accelerate multi_gpu.yaml I think the issue is...
Hi, thank you for your great work! I'd like to reproduce full parameter fine-tuning of dpo training. However I only have 10 * Nvidia A40 GPUs (46 Gbs memory each)....
Because of the following LLM-Leaderboard measurements, I want to perform QLoRA DPO without previous QLoRA SFT: ``` alignment-handbook/zephyr-7b-dpo-qlora: +Average: 63.51; +ARC 63.65; +HSwag 85.35; -+MMLU 63.82; ++TQA: 47.14; (+)Win 79.01;...
In reference to the post - [Constitutional AI with Open LLMs](https://huggingface.co/blog/constitutional_ai) I wanted to ask if you could share what were the total costs involved in generating the dataset?
Is there any flash-attention free version?
Hello, I was seeing warning during finetuning Mistral and tracked this line here https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L71 Because Mistral's tokenizer model max length has a large number so the model_max_length set as 2048....
In parallel with #38, tho i am relating to full training instead of lora. When i use a different set of prefs (ie chosen and rejected) but still same instructions...
Hi the team, great work! I wonder whether there will be demo / example about training reward models in multi-GPU env or deepspeed seetings? Thanks!
I have a general question about Supervised Fine Tuning (SFT) for Dialogue applications. Should the SFT process use the same LM objective (next-token prediction) that is used in pre-training a...