data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

[Bug]: validator 字段类型校验错误地读取yaml中的 field_types为str 导致字段类型校验的 isinstance 抛出异常

Open kongzhinvwang2 opened this issue 1 month ago • 0 comments

Before Reporting 报告之前

  • [x] I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

  • [x] I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)

Search before reporting 先搜索,再报告

  • [x] I have searched the Data-Juicer issues and found no similar bugs. 我已经在 issue列表 中搜索但是没有发现类似的bug报告。

OS 系统

Linux

Installation Method 安装方式

pip

Data-Juicer Version Data-Juicer版本

latest

Python Version Python版本

3.10

Describe the bug 描述这个bug

yaml文件中配置 validator field_type(官方脚本) `validators: # validators are a list of validators to be applied when loading a dataset # it checks a sample of the dataset for each validator # check data_juicer/ore/data/data_validator.py for more validator options

  • type: 'required_fields' # required_fields is a validator to check the required fields in the dataset. required_fields: # required_fields is a list of required fields.
    • "text"

    field_types: # field_types is a dictionary of field types.

    text: 'str'`

其中 field_types 在 data_juicer/core/data/data_validator.py 中被设置为expected_type = self.field_types.get(field) 这会导致读取到的 expected_type 为字符串类型的 str、list.... 在校验时 invalid_types = [type(v) for v in sample_values if v is not None and not isinstance(v, expected_type)] 没有将 expected_type 转为 type 类型,导致抛出异常 TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union

To Reproduce 如何复现

只要 validator 的yaml 文件设置 field_types 即可复现

Configs 配置信息

No response

Logs 报错日志

TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union

Screenshots 截图

No response

Additional 额外信息

只需要对expected_type进行类型转换即可解决此问题

kongzhinvwang2 avatar Oct 14 '25 08:10 kongzhinvwang2