data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Results 76 data-juicer issues
Sort by recently updated
recently updated
newest added

### Search before continuing 先搜索,再继续 - [X] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。 ### Description 描述 This proposal is...

enhancement
dj:op

### Before Asking 在提问之前 - [X] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [X] I have pulled the latest code of main branch to run again and...

question

1. setup local actions runner on a GPU machine 2. use docker-compose to setup a cluster 3. add more unit tests - single process - multi process - multi process...

enhancement

### Before Asking 在提问之前 - [X] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [X] I have pulled the latest code of main branch to run again and...

question
stale-issue

### Before Asking 在提问之前 - [X] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [X] I have pulled the latest code of main branch to run again and...

enhancement
question
stale-issue

**Issue Description:** Hello. I have discovered a performance degradation in the `read_csv` function of pandas version below 2.0.1. And I notice some parts of the repository depend on pandas 2.0.0...

stale-issue

- Initial version to enrich the multimodal evaluation features, using GPT4V API to assess models - Welcome further testing and refinement

enhancement
dj:multimodal
stale-pr

Add tools for FVD, KVD, ISV, PRV, FID, KID, IS, PR eval for videos.

documentation
enhancement
dj:multimodal