Hello, I would like to ask, when you are training the model, do you only use the first round of dialogue from the ultrachat_200k?

Open jackwwy opened this issue 1 year ago • 1 comments

def load_and_process_data_ultrachat(dataset_name, split): try: dataset = load_dataset(dataset_name, split=split) reformatted_data = [{ 'generated': [message['messages'][0], {"role": "assistant", "content": ""}], 'real': [message['messages'][0], message['messages'][1]] } for message in dataset] return reformatted_data except Exception as e: logging.error(f"Error loading or processing dataset: {e}") return []

Jul 21 '24 12:07 jackwwy

Yes. Only the first round of real dialogue dataset is sampled from ultrachat 200k.

Nov 04 '24 02:11 junming-yang