alignment-handbook issues

Efficient dialog data format for KTO training

I have dialogs in the shareGPT format (see below) and for each `gpt` turn a label (thumbs up or thumbs down). But for KTO training, I have only seen datasets...

DavidFarago

Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model?

2

There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full. I wonder what causes this misalignment ? Also, have...

ChenDRAG

How to select parts to bp in sft

![image](https://github.com/huggingface/alignment-handbook/assets/77482343/903dd930-18b3-4eec-9aba-1bc0248a5302) As the pic has shown, there are some cases that some parts of the gpt's response should not be cacluated in backward computing, if I want to achieve this...

Fu-Dayuan

wierd conversation with zephyr-7b-dpo-lora

2

![VX0N3YSQ@~VU@RPDR5@$KWE](https://github.com/huggingface/alignment-handbook/assets/84949179/cd22f16c-e160-43a9-b2cc-aec5b87b1ada) When chatting with zephyr-7b-dpo-lora, as shown in the fig above , only the first 'Hello' was I sent, all the following content are generated by zephyr, including the \...

njupopsicle

Is there a way to freeze some layers of a model ?

Can we follow the normal way of: ``` for param in model.base_model.parameters(): param.requires_grad = False ```

shamanez

Early Stopping Issue when used with ConstantLengthDataset

Hello I modified the code to include the Constant Length Dataset and it's early stopping at around 15% of the training. This issue doesn't occur when not used with the...

sankydesai

Not able to run Zephyr 7B Gemma with 4 80GB A100s

1

I'm not able to run Zephyr 7B Gemma with 4 80GB A100s. I get the following error: ``` RuntimeError: The size of tensor a (0) must match the size of...

TJ-Solergibert

Downloading latest CUDA version (11.6 or above) for MacOS to use FlashAttention

It seems that FlashAttention is only supported on CUDA 11.6 and above. According to https://developer.nvidia.com/cuda-downloads, it seems that the latest version of CUDA (12.3) can't be downloaded for MacOS. I...

shubhamcs162

Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py

7

In run_sft.py and run_dpo.py, it says that it applies the chat template. But this is not actually done. In the code below, column_names contains all the names of the columns,...

AlexiaJM

Using MT-Bench to evaluate zephyr

2

In the Readme.md [here](https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models), it says : - `Make sure the word zephyr exists in the --model-path argument when generating the model responses...` We should also ensure the word zephyr...

abgoswam

alignment-handbook
alignment-handbook copied to clipboard

Metadata

Efficient dialog data format for KTO training

Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model?

How to select parts to bp in sft

wierd conversation with zephyr-7b-dpo-lora

Is there a way to freeze some layers of a model ?

Early Stopping Issue when used with ConstantLengthDataset

Not able to run Zephyr 7B Gemma with 4 80GB A100s

Downloading latest CUDA version (11.6 or above) for MacOS to use FlashAttention

Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py

Using MT-Bench to evaluate zephyr

← Metadata

Owner

Metadata

alignment-handbook alignment-handbook copied to clipboard

Metadata

← Metadata

Owner

Metadata

alignment-handbook
alignment-handbook copied to clipboard