PaddleNLP
PaddleNLP copied to clipboard
关于semantic-search上传txt格式文件中,多行换行的问题
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/pipelines/nodes/file_converter/txt.py 目前全都是手动回车为一个换行后,上传建立索引; 麻烦能否处理成上传txt文档,txt中难免有多个换行的情况,也不会影响数据的索引建立呢?

我试了一下,会出现很多空文本,不过影响不大
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。