alignment-handbook issues

Zephyr-dpo-full Checkpoints perform poorly on TruthfulQA.

1

Hello, I observe that the models I trained and the official models provided by [HuggingFace](https://huggingface.co/alignment-handbook/) do not match the results of [Zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on TruthfulQA. I used lm_evaluate_harness for evaluation, and...

haochengxi

DPO recipe saves a float32 model

Hello, I have been using the Zephry DPO recipe and the models I get are save in float32. I am using config_full and accelerate multi_gpu.yaml I think the issue is...

tcapelle

How to perform full parameter finetuning without A100 GPUs

13

Hi, thank you for your great work! I'd like to reproduce full parameter fine-tuning of dpo training. However I only have 10 * Nvidia A40 GPUs (46 Gbs memory each)....

ChenDRAG

(QLoRA) DPO without previous SFT

1

Because of the following LLM-Leaderboard measurements, I want to perform QLoRA DPO without previous QLoRA SFT: ``` alignment-handbook/zephyr-7b-dpo-qlora: +Average: 63.51; +ARC 63.65; +HSwag 85.35; -+MMLU 63.82; ++TQA: 47.14; (+)Win 79.01;...

DavidFarago

Cost of Generating a Dataset for Constitutional AI

In reference to the post - [Constitutional AI with Open LLMs](https://huggingface.co/blog/constitutional_ai) I wanted to ask if you could share what were the total costs involved in generating the dataset?

Ashish-Soni08

how to use dpo without flash-attention

1

Is there any flash-attention free version?

Fu-Dayuan

Tokenizer model_max_length

6

Hello, I was seeing warning during finetuning Mistral and tracked this line here https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L71 Because Mistral's tokenizer model max length has a large number so the model_max_length set as 2048....

binarycrayon