alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
I cannot replicate the DPO results for zephyr. I use a modified version of config_full.yaml with the only difference being that I set gradient_accumulation_steps: 4 instead of 2 because I...
I modified deepspeed_sero3.yaml, set num_machines to 8 and num_processes to 8, and I got the following error, what else should I do to run SFT on 8 nodes platform. Thanks...
For QLoRA+FSDP support, the dependencies should be updated: - `bitsandbytes>=0.43.0` - `accelerate>=0.28.0` - `transformers>4.38.2` - `trl>0.7.11` - `peft>0.9.0` Also, it would be wonderful to have accelerate recipe for this too.
Hello, I'm interested in utilizing [run_dpo](https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_dpo.py), but I'm unsure about the required parameters. Could someone provide me with some guidance on which parameters need to be passed?
It is said that zephyr-7b-dpo-qlora is finetuned from zephyr-7b-sft-qlora. However, in the adapter config file, the base model is set to mistralai/Mistral-7B-v0.1. Also, I downloaded the model from https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora, and...
I am trying to conduct CPT with a mistral-instruct-v2. But every time, I notice an overshooting in the grad norm. I tried different datasets and managed to re-produce the same...
Environment: transformers: 4.39.0.dev0 trl: 0.7.10 torch: 2.2.2 8 x H100 (80GB) I am encountering an issue where the training process with DPO on a multi-GPU setup gets stuck. This problem...
So I'm attempting to run the DPO LoRA script and I'm getting this error: ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (4096)...
Hi, thanks for your great work! I'm especially interested in the recently-introduced constitutional-ai tuning in this [blog post](https://huggingface.co/blog/constitutional_ai). I've found the open-source [SFT model](https://huggingface.co/alignment-handbook/mistral-7b-sft-constitutional-ai) and [DPO model](https://huggingface.co/HuggingFaceH4/mistral-7b-anthropic) on huggingface. However,...
Since we have the CPT task now, it would be nice to have the ability to feel a tokenized and packed dataset directly.