ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

能否提供一个可以直接运行的grpo数据集

Open tonylin52 opened this issue 9 months ago • 1 comments

想要运行grpo,根据readme开始执行,下载qwedsacf/competition_math dataset数据集

prepare_prompt_dataset.sh开始就出现错误,然后就运行不下去了。

希望作者可以对readme的可复现方式详细说明一下,谢谢

tonylin52 avatar Feb 21 '25 01:02 tonylin52

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: Can you provide a grpo dataset that can be run directly

If you want to run grpo, start executing according to readme

The error occurred at the beginning of prepare_prompt_dataset.sh and then it could not be run anymore.

I hope the author can explain in detail the reproducible method of readme. Thank you

Issues-translate-bot avatar Feb 21 '25 01:02 Issues-translate-bot

Hi, we are under intensive development and will release a new version soon including more concise documentation and speed optimization.

TongLi3701 avatar Feb 21 '25 02:02 TongLi3701

想要运行grpo,根据readme开始执行,下载qwedsacf/competition_math dataset数据集

prepare_prompt_dataset.sh开始就出现错误,然后就运行不下去了。

希望作者可以对readme的可复现方式详细说明一下,谢谢

需要将数据集转换成jsonl格式再运行prepare_prompt_dataset.sh

zhengdong914 avatar Feb 21 '25 02:02 zhengdong914

想要运行grpo,根据readme开始执行,下载qwedsacf/competition_math dataset数据集 prepare_prompt_dataset.sh开始就出现错误,然后就运行不下去了。 希望作者可以对readme的可复现方式详细说明一下,谢谢

需要将数据集转换成jsonl格式再运行prepare_prompt_dataset.sh

实验过 ds = load_from_disk('./deepseek/data/NuminaMath-TIR') ds['train'].to_json("./deepseek/data/NuminaMath-TIR-jsonl/train.jsonl") ds['test'].to_json("./deepseek/data/NuminaMath-TIR-jsonl/test.jsonl")

然后跑完还是报错,所以我怀疑这个代码跑通的真实性。

tonylin52 avatar Feb 24 '25 05:02 tonylin52

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


If you want to run grpo, start execution according to readme, download the qwedsacf/competition_math dataset dataset prepare_prompt_dataset.sh error occurs at the beginning and then it cannot be run. I hope the author can explain in detail the reproducible method of readme, thank you

You need to convert the dataset into jsonl format and then run prepare_prompt_dataset.sh

Experimented ds = load_from_disk('./deepseek/data/NuminaMath-TIR') ds['train'].to_json("./deepseek/data/NuminaMath-TIR-jsonl/train.jsonl") ds['test'].to_json("./deepseek/data/NuminaMath-TIR-jsonl/test.jsonl")

Then after running, it still reported an error, so I doubt the authenticity of this code running.

Issues-translate-bot avatar Feb 24 '25 05:02 Issues-translate-bot