data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

多个算子要操作的数据域不一样如何设置?

Open edc3000 opened this issue 4 months ago • 0 comments

Before Asking 在提问之前

  • [x] I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。

  • [x] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

Search before asking 先搜索,再提问

  • [x] I have searched the Data-Juicer issues and found no similar questions. 我已经在 issue列表 中搜索但是没有发现类似的问题。

Question

  1. 如果同一个算子(如document_minhash_deduplicator)先去重instruction,再去重answer,配置上如何设计?能支持吗?
  2. 类似的问题,不同的算子都设置不一样的待处理数据项,如何配置

Additional 额外信息

No response

edc3000 avatar Jul 04 '25 05:07 edc3000