alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

Open Feynman27 opened this issue 1 year ago • 2 comments

When running the DPO script, when calling

    #####################
    # Apply chat template
    #####################
    raw_datasets = raw_datasets.map(
        apply_chat_template,
        fn_kwargs={"tokenizer": tokenizer, "task": "dpo"},
        num_proc=data_args.preprocessing_num_workers,
        remove_columns=column_names,
        desc="Formatting comparisons with prompt template",
    )

I'm getting the error:

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

Feynman27 avatar Jan 09 '24 00:01 Feynman27

I think this has something to do with the tokenizer. I trained an SFT model, and am providing the local path to that model for DPO. If I use the default path from the hub alignment-handbook/zephyr-7b-sft-full, I don't get the error and DPO training starts fine.

Feynman27 avatar Jan 09 '24 21:01 Feynman27

It appears the tokenizer_config.json written to the output model directory during the SFT stage needs to be replaced if loading the SFT model from that same local directory for the DPO phase. I switched out the tokenizer_config from the SFT phase to the one from the model card, and DPO training works now. It looks like all other configs are the same between SFT and DPO (e.g. tokenizer.json).

This was not obvious at all. Can we add a note to the README or make this more fool-proof?

Feynman27 avatar Jan 09 '24 22:01 Feynman27