alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
I have dialogs in the shareGPT format (see below) and for each `gpt` turn a label (thumbs up or thumbs down). But for KTO training, I have only seen datasets...
Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model?
There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full. I wonder what causes this misalignment ? Also, have...
 As the pic has shown, there are some cases that some parts of the gpt's response should not be cacluated in backward computing, if I want to achieve this...
 When chatting with zephyr-7b-dpo-lora, as shown in the fig above , only the first 'Hello' was I sent, all the following content are generated by zephyr, including the \...
Can we follow the normal way of: ``` for param in model.base_model.parameters(): param.requires_grad = False ```
Hello I modified the code to include the Constant Length Dataset and it's early stopping at around 15% of the training. This issue doesn't occur when not used with the...
I'm not able to run Zephyr 7B Gemma with 4 80GB A100s. I get the following error: ``` RuntimeError: The size of tensor a (0) must match the size of...
It seems that FlashAttention is only supported on CUDA 11.6 and above. According to https://developer.nvidia.com/cuda-downloads, it seems that the latest version of CUDA (12.3) can't be downloaded for MacOS. I...
In run_sft.py and run_dpo.py, it says that it applies the chat template. But this is not actually done. In the code below, column_names contains all the names of the columns,...
In the Readme.md [here](https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models), it says : - `Make sure the word zephyr exists in the --model-path argument when generating the model responses...` We should also ensure the word zephyr...