PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

关于semantic-search上传txt格式文件中,多行换行的问题

Open bruce0210 opened this issue 2 years ago • 1 comments

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/pipelines/nodes/file_converter/txt.py 目前全都是手动回车为一个换行后,上传建立索引; 麻烦能否处理成上传txt文档,txt中难免有多个换行的情况,也不会影响数据的索引建立呢?

bruce0210 avatar Jul 27 '22 09:07 bruce0210

image

我试了一下,会出现很多空文本,不过影响不大

w5688414 avatar Jul 28 '22 07:07 w5688414

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 08 '22 02:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 22 '22 16:12 github-actions[bot]