axolotl DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3

Open Catgat opened this issue 9 months ago • 4 comments

The BoS should only appear at the start of the prompt.

The BoS token is inserted at the start of the prompt and also at the start of the Chosen and Rejected prompts.

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:91] [PID:718] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:92] [PID:718] [RANK:0] CHOSEN RESPONSE: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:93] [PID:718] [RANK:0] REJECTED RESPONSE: <|begin_of_text|>(128000)

Run a DPO tune using intel.chatml. Preprocess the dataset with --debug flag and you'll see that the BoS token is outputted.

rl: dpo
datasets:
  - ds_type: json
    data_files: 
      - combinedDPO.json
    split: train
    type: chatml.intel

No response

Whatever version the latest docker uses.

The latest commit that the docker is using.

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

May 14 '24 14:05 Catgat