ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

请问如何在grpo中配置自定义的数据集路径,并进行数据格式转换?

Open luchenhao-luke opened this issue 9 months ago • 2 comments

  1. 我有一个数据集路径:/home/data2/xxx.json
  2. 该数据集格式是: { "instruction":"xxx", "input":"xxx", "output":"xxx" }
  3. 想要按照下面的格式进行数据的组织: SYSTEM_PROMPT = """ 按照如下格式生成: <|begin_of_thought|> ... <|end_of_thought|> <|begin_of_solution|> ... <|end_of_solution|> """ def process_data(data): data = data.map(lambda x: { 'prompt': [ {'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': x['instruction'] + x['input']} ], 'answer': x['output'] }) return data

想问下如何完成这三步的数据配置?以用来进行grpo的训练

luchenhao-luke avatar Mar 17 '25 03:03 luchenhao-luke

请问你知道了吗

zhangansen avatar May 01 '25 05:05 zhangansen

https://swift.readthedocs.io/zh-cn/latest/Customization/%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86.html#id1

hjh0119 avatar May 01 '25 07:05 hjh0119