alignment-handbook
alignment-handbook copied to clipboard
Robust recipes to align language models with human and AI preferences
Prectise in python shell is successed, but error is exist when I used "accelarate launch"....
Current training uses ConstantLengthDataset. This dataset return fixed length of tokens (2048) in every step, however, the total number of steps are calculated based on the number of samples. I...
You claim that "[In practice, we find comparable performance for both full and LoRA fine-tuning, with the latter having the advantage of producing small adapter weights that are fast to...
I am planning to run SFT on real chatlogs so naturally I don't have the `prompt` field like in the Ultrachat dataset. AFAICT, this field is not used to perform...
Hi, when I ran the dpo finetuning code, I noticed that there is a warning in the logging output `[WARNING|tokenization_utils_base.py:3831] 2023-12-06 16:44:52,195 >> Token indices sequence length is longer than...
This issue collects links of community feedback on the type of content to include in the handbook. Feel free to post a comment below with other ideas / requests! *...
I noticed that the alignment-handbook doesn't ignore the loss calculated from both the user and system inputs Based on my knowledge, many SFT choose to ignore these. I'm curious about...
Hi! Thanks again for the awesome repo. I have a small question regarding the global batch size of DPO training reported in the paper vs used in the code base....
Working as an cms admin in my company. I have around 1 million emails back and forth to our customers. How can i utilize the emails to make a chatbot...
It seems that the system prompt is left to be `\n` or rather blank. Inspecting UltraChat (https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k?row=5), seems that no system prompt is added to the dataset. There must be...