dify icon indicating copy to clipboard operation
dify copied to clipboard

feat: support regex pattern in segement

Open hibernate2011 opened this issue 10 months ago • 1 comments

Description

Use regex pattern to better control the segement result in text cleaning

Type of Change

Please delete options that are not relevant.

  • [x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • [x] Tested in local environment

Suggested Checklist:

  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] My changes generate no new warnings

Next Step:

  • [x] Please help edit i18n transelate for 'separatorPlaceholder' in all dataset-creation.ts files, because I'm afraid my translation won't be accurate
  • [ ] Update frontpage to separate two way of split chunks,make it easier to use

hibernate2011 avatar Apr 12 '24 03:04 hibernate2011

针对redos问题,由于待分割字符串不可控,如果直接让用户输入正则的话,确实风险比较大,尤其对于开放使用的Dify Cloud,超长待分割文本加恶意正则会极大消耗CPU资源 1、限制正则长度不能很好地解决,暂时也没有找到合适的可以直接代码调用的正则风险评估库 2、限制执行时间,由于执行时间关系到系统的即时负载,这个超时时间不好定,而且超时后相当于已经对服务器造成了影响 目前看这个功能最好只是作为一个默认不启用的私有部署可选功能了

hibernate2011 avatar Apr 12 '24 10:04 hibernate2011

Close for now

crazywoola avatar Apr 18 '24 09:04 crazywoola