alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
Hi, I would like to raise some attention to issue #38. It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems...
DPO loss
I am training DPO with lora, the loss has weird behavior: will decrease sharply at the beginning of each epoch. I wonder if you have same issue before?
  Hello, I am so impressed by your models. I tried fine tuning your models with my data and the evaulation_loss is not optimized as shown in the image...
It is possible to download and use this entire repo on windows, with the exception of deepspeed. After trying to install the alignment-notebook package I found you can simply remove...
Has anyone else experienced cases where the training finishes early as max length increases? Ran this script on a custom dataset with the following config. No CUDA errors. It just...
Hi, I noticed that in the [model card](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-lora). It says Adam optimizer is used. However, in the `config_lora.yaml` file, it uses `optim: rmsprop`. Could you tell me which one is...
Had a question about the max_seq_length hyper parameter. I just started training and set the config for SFT to be the below: ``` # Model arguments model_name_or_path: mistralai/Mistral-7B-v0.1 model_revision: main...
The learning rate default in the dpo recipe config is set to 5e-7 and https://huggingface.co/Intel/neural-chat-7b-v3 was trained with a learning rate of 1e-4 (using of course a different data set...
I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on...