alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
After using Axolotl to SFT my mistral7b model I tried to align it using DPO At some point in the code (in the DPOTrainer initialization) the code freezes and stops...
Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting: ``` if model_args.use_peft is True: ref_model = None...
The results reported in https://github.com/huggingface/alignment-handbook/pull/88 suggest that QLoRA is better for both SFT and DPO. Is this accurate, and have people seen this happen in any other settings?
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
When running the DPO script, when calling ```python ##################### # Apply chat template ##################### raw_datasets = raw_datasets.map( apply_chat_template, fn_kwargs={"tokenizer": tokenizer, "task": "dpo"}, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, desc="Formatting comparisons with prompt template", )...
Hello everyone, I'm encountering a memory issue while fine-tuning a 7b model (such as Mistral) using a repository I found. Despite having 6 H100 GPUs at my disposal, I run...
Here's the call I'm using to run the script: ``` ACCELERATE_LOG_LEVEL=info accelerate launch --config_file examples/hf-alignment-handbook/configs/accelerate_configs/deepspeed_zero3.yaml --num_processes=2 examples/hf-alignment-handbook/run_sft.py examples/hf-alignment-handbook/configs/training_configs/zephyr-7b-beta/config_lora_sft.yaml --load_in_4bit=true ``` Here's the full trace of the error: ``` 2023-12-01 00:05:43...
In the AI Feedback (AIF) phase, with GPT-4 serving as the teacher model,I am curious to know if there might be a propensity for GPT-4 to assign higher ratings to...
I encountered this error (importerror: /flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol) and I tried many ways to solve it but failed. Could you kindly share your version of flash-attn?
I am trying to train yi-34B model using LORA setup on multi-gpu. But i am getting constant loss i.e. around 2 throughout my SFT training on 4 epochs. And inferencing...