data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Results 117 data-juicer issues
Sort by recently updated
recently updated
newest added

### Before Reporting 报告之前 - [X] I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。 - [X] I have read the...

bug

### Before Reporting 报告之前 - [X] I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。 - [X] I have read the...

bug

目前的RangeSpecifiedFieldSelector类只支持通过百分位数和rank进行select,这并不符合直觉,最简单且最常用的方法应该是根据某个字段的值域进行选择,比如相似度大于某个阈值,PPL小于某个阈值等,本次PR支持这一功能。 此外,原来的process函数在逻辑判断时存在一定的问题(不支持某些情况下的缺省),lower_percentile和lower_rank不能同时为None,upper_percentile和upper_rank也不能同时为None,否则就不会进行select,这不适用于只有上界或者只有下界的情况,本次PR针对这一逻辑进行了优化。

As the title says.

enhancement
dj:op

### Before Reporting 报告之前 - [X] I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。 - [X] I have read the...

bug

Add calling api with agentscope to demos.

enhancement
agent

Perform segment-anything on images (with FastSAM) and return the bounding box values. Hyperparameters: - imgsz: image resolution after image resizing - conf: confidence score threshold - iou: IoU (Intersection over...

enhancement
dj:multimodal
dj:op

Augment sentences using LLMs. Hyperparameters: - system_prompt: system prompt; - task_sentence: the instruction for the current task; - max_new_tokens: the maximum number of new tokens generated by the model; -...

enhancement
dj:multimodal
dj:op

### Before Asking 在提问之前 - [X] I have read the [README](https://github.com/alibaba/data-juicer/blob/main/README.md) carefully. 我已经仔细阅读了 [README](https://github.com/alibaba/data-juicer/blob/main/README_ZH.md) 上的操作指引。 - [X] I have pulled the latest code of main branch to run again and...

question