alignment-handbook
alignment-handbook copied to clipboard

Published 20 hours ago •

→

Metadata

Robust recipes to align language models with human and AI preferences

Reame
Issues

Results 92 alignment-handbook issues

Sort by recently updated

Weird DPO loss

1

comment

Hi, I would like to raise some attention to issue #38. It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems...

DPO loss

7

comment

I am training DPO with lora, the loss has weird behavior: will decrease sharply at the beginning of each epoch. I wonder if you have same issue before?

Is there available SFT fine tuning for zephyr-7B families?

3

comment

![image](https://github.com/huggingface/alignment-handbook/assets/40377750/d664a446-88cc-408a-b8f0-73595902a7d9) ![image](https://github.com/huggingface/alignment-handbook/assets/40377750/705aa2a0-d489-4fbd-9923-d2a8ee31bd3d) Hello, I am so impressed by your models. I tried fine tuning your models with my data and the evaulation_loss is not optimized as shown in the image...

Windows installation

It is possible to download and use this entire repo on windows, with the exception of deepspeed. After trying to install the alignment-notebook package I found you can simply remove...

NicolasMejiaPetit

Training Finishes Prematurely after Max Length increases

2

comment

Has anyone else experienced cases where the training finishes early as max length increases? Ran this script on a custom dataset with the following config. No CUDA errors. It just...

Misalignment between config_lora.yaml and the model card

Hi, I noticed that in the [model card](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-lora). It says Adam optimizer is used. However, in the `config_lora.yaml` file, it uses `optim: rmsprop`. Could you tell me which one is...

Max Sequence Length

Had a question about the max_seq_length hyper parameter. I just started training and set the config for SFT to be the below: ``` # Model arguments model_name_or_path: mistralai/Mistral-7B-v0.1 model_revision: main...

Release dSFT data preparation (self-instruct) code?

Question about DPO learning rate - comparison to neural-chat-7b-v3 training

The learning rate default in the dpo recipe config is set to 5e-7 and https://huggingface.co/Intel/neural-chat-7b-v3 was trained with a learning rate of 1e-4 (using of course a different data set...

sebastianschramm

Training Interruptions and Epoch Skipping with 6 Billion Parameter Model on 8 A100 GPUs

1

comment

I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on...

‹
1
2
3
4
5
6
7
8
9
10
›

About

Robust recipes to align language models with human and AI preferences

transformers

llm

rlhf

4.5k

Stars

387

Forks

Watchers

Owner

← Metadata

4.5k

Stars

387

Forks

Watchers

Owner

Metadata

Robust recipes to align language models with human and AI preferences