alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Robust recipes to align language models with human and AI preferences

Results 92 alignment-handbook issues
Sort by recently updated
recently updated
newest added

I have dialogs in the shareGPT format (see below) and for each `gpt` turn a label (thumbs up or thumbs down). But for KTO training, I have only seen datasets...

There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full. I wonder what causes this misalignment ? Also, have...

![image](https://github.com/huggingface/alignment-handbook/assets/77482343/903dd930-18b3-4eec-9aba-1bc0248a5302) As the pic has shown, there are some cases that some parts of the gpt's response should not be cacluated in backward computing, if I want to achieve this...

![VX0N3YSQ@~VU@RPDR5@$KWE](https://github.com/huggingface/alignment-handbook/assets/84949179/cd22f16c-e160-43a9-b2cc-aec5b87b1ada) When chatting with zephyr-7b-dpo-lora, as shown in the fig above , only the first 'Hello' was I sent, all the following content are generated by zephyr, including the \...

Can we follow the normal way of: ``` for param in model.base_model.parameters(): param.requires_grad = False ```

Hello I modified the code to include the Constant Length Dataset and it's early stopping at around 15% of the training. This issue doesn't occur when not used with the...

I'm not able to run Zephyr 7B Gemma with 4 80GB A100s. I get the following error: ``` RuntimeError: The size of tensor a (0) must match the size of...

It seems that FlashAttention is only supported on CUDA 11.6 and above. According to https://developer.nvidia.com/cuda-downloads, it seems that the latest version of CUDA (12.3) can't be downloaded for MacOS. I...

In run_sft.py and run_dpo.py, it says that it applies the chat template. But this is not actually done. In the code below, column_names contains all the names of the columns,...

In the Readme.md [here](https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models), it says : - `Make sure the word zephyr exists in the --model-path argument when generating the model responses...` We should also ensure the word zephyr...