alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
Hi, I wonder which TruthfulQA task you are focusing on during evaluation? MC1, MC2, or generation task?
use huggingface pipleline to run inference task, but found finetuned `HuggingFaceH4/zephyr-7b-beta` and model `HuggingFaceH4/zephyr-7b-beta` generates exactly otuputs. Does anyone have any clue about this error?
I was looking at the logs of your training (from this [json](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta/resolve/main/trainer_state.json?download=true) file) and realized that the scheduling is messed up. It's related to the ConstantLength dataset, not computing its...
## Abstract In the rapidly evolving field of artificial intelligence (AI), aligning AI systems with human values and intentions, known as AI alignment, is of paramount importance. This whitepaper introduces...
The script errors out only with Yi 34B Chat. I have tried Llama2 7/13B and SUSTech/SUS-Chat-34B and they all work. Yi 34B Chat has consistently been running into the following...
Hi, What is the best way to run this on my high performance laptop? Should this somehow work? Can i calculate how many days/weeks it will run? Thanks in advance...
Just wanted to report a crash while training. **Error message:** `[process exited with code 1 (0x00000001)]` **Command i used to start the process:** `ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py...
I've run the training without changing any hyperparameter except for batch size and gradient accumulation steps to match the global batch size on two machines. The first run is exactly...
The paper evaluates on ARC, HellaSwag, MMLU, and TruthfulQA, but this repo does not reference these evals. Adding short explanation regarding these evals (e.g., in https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models) would be nice
Recently, I attempted to fit the DPO on my own dataset. Initially, I tried to reproduce the results of your LORA model( 7.43 on MT-Bench). However, I encountered some issues....