dify
dify copied to clipboard
feat: support regex pattern in segement
Description
Use regex pattern to better control the segement result in text cleaning
Type of Change
Please delete options that are not relevant.
- [x] New feature (non-breaking change which adds functionality)
How Has This Been Tested?
- [x] Tested in local environment
Suggested Checklist:
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] My changes generate no new warnings
Next Step:
- [x] Please help edit i18n transelate for 'separatorPlaceholder' in all dataset-creation.ts files, because I'm afraid my translation won't be accurate
- [ ] Update frontpage to separate two way of split chunks,make it easier to use
针对redos问题,由于待分割字符串不可控,如果直接让用户输入正则的话,确实风险比较大,尤其对于开放使用的Dify Cloud,超长待分割文本加恶意正则会极大消耗CPU资源 1、限制正则长度不能很好地解决,暂时也没有找到合适的可以直接代码调用的正则风险评估库 2、限制执行时间,由于执行时间关系到系统的即时负载,这个超时时间不好定,而且超时后相当于已经对服务器造成了影响 目前看这个功能最好只是作为一个默认不启用的私有部署可选功能了
Close for now