axolotl
axolotl copied to clipboard
DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
The BoS should only appear at the start of the prompt.
Current behaviour
The BoS token is inserted at the start of the prompt and also at the start of the Chosen and Rejected prompts.
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:91] [PID:718] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000)
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:92] [PID:718] [RANK:0] CHOSEN RESPONSE: <|begin_of_text|>(128000)
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:93] [PID:718] [RANK:0] REJECTED RESPONSE: <|begin_of_text|>(128000)
Steps to reproduce
Run a DPO tune using intel.chatml. Preprocess the dataset with --debug flag and you'll see that the BoS token is outputted.
Config yaml
rl: dpo
datasets:
- ds_type: json
data_files:
- combinedDPO.json
split: train
type: chatml.intel
Possible solution
No response
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
Whatever version the latest docker uses.
axolotl branch-commit
The latest commit that the docker is using.
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.