alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Robust recipes to align language models with human and AI preferences

Results 90 alignment-handbook issues
Sort by recently updated
recently updated
newest added

Thank you guys for your work! i was using fsdp + qlora fine tuning llama3 70B on 8* A100 80G, and i encountered this error: ```shell Traceback (most recent call...

Hi @edbeeching , thanks for the great work in ablating KTO/IPO/DPO algorithms in #104 . I notice that in this referenced [blog](https://huggingface.co/blog/pref-tuning ), it says the best performing model for...

Hi the team, great work! QDoRA seems to be better than QLoRA, refer to [Efficient finetuning of Llama 3 with FSDP QDoRA](https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html) I wonder whether there will be demo /...

I added a 4-bit load after the command LoRA training with ZeRO-3 on two or more GPUs to achieve a mix of QLoRA and ZeRO-3. But the program encountered the...

I downloaded a dataset from hf. I want to load it locally, but it still tries to download it from hf and place it into the cache. How can I...

## Description As briefly discussed with @lewtun this morning, this PR adds the `scripts/run_kto.py` script to fine-tune LLMs using the `trl.KTOTrainer` from the `alignment-handbook`. The script should work as is,...

## Description Since recently Mistral marked their repository at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 as a gated repository, I'm afraid that the account using the `HF_TOKEN` set as a secret for the CI will...

I forgot the do_train when creating the cpt script, or for some reason left it out. But I think it would be ueseful to add it still.

Hello, Thank you for sharing this awesome resource! I have a question regarding models that already have a chat template like "mistralai/Mistral-7B-Instruct-v0.1". I'm planning on using the non packed dataset....

from the README from `/scripts`. ```yaml datasets_mixer: dataset_1: 0.5 # Use 50% of the training examples dataset_2: 0.66 # Use 66% of the training examples dataset_3: 0.10 # Use 10%...