data-juicer
data-juicer copied to clipboard
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
### Search before continuing 先搜索,再继续 - [x] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。 ### Description 描述 比如:kafka->filter1->filter2->mapper1->files or kafka...
Add EvalscopeEvaluator and MedEvaluator to EvaluateModelHook in DJ-Sandbox. Implement the following features: - Enable LLM evaluation through [evalscope](https://github.com/modelscope/evalscope) capabilities - One-stop solution to launch MedEval workflow and generate corresponding radar...
Introduces a novel QA generation module based on self-challenging mechanisms, designed to autonomously synthesize high-quality reasoning-focused question-answer pairs, inspired by the [MindGYM paper](https://arxiv.org/abs/2503.09499).
### Before Asking 在提问之前 - [x] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [x] I have pulled the latest code of main branch to run again and...
### Before Asking 在提问之前 - [x] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [x] I have pulled the latest code of main branch to run again and...
### Search before continuing 先搜索,再继续 - [x] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。 ### Description 描述 In the current...