SPIN
SPIN copied to clipboard
Hello, I would like to ask, when you are training the model, do you only use the first round of dialogue from the ultrachat_200k?
def load_and_process_data_ultrachat(dataset_name, split): try: dataset = load_dataset(dataset_name, split=split) reformatted_data = [{ 'generated': [message['messages'][0], {"role": "assistant", "content": ""}], 'real': [message['messages'][0], message['messages'][1]] } for message in dataset] return reformatted_data except Exception as e: logging.error(f"Error loading or processing dataset: {e}") return []
Yes. Only the first round of real dialogue dataset is sampled from ultrachat 200k.