NLPIR icon indicating copy to clipboard operation
NLPIR copied to clipboard

长文本分词会很慢,出现part of speech not recognized: 'gwz'

Open ChangweiLi opened this issue 7 years ago • 1 comments

ChangweiLi avatar Jul 09 '18 02:07 ChangweiLi

1.这个是内部的一个词性,去掉Data目录下的Field.pdat就行了; 2.长文本一次性加载内存本身就耗时,建议你才有FileProcess,或者切割为小内存,一段段的处理;

Best wishes 张华平 博士 副教授 研究生导师 大数据搜索挖掘实验室(北京市海量语言信息处理与云计算应用工程技术研究中心) 主任 地址:北京海淀区中关村南大街5号 100081 电话:+86-10-68918642 Email:[email protected] MSN: [email protected]; 网站: http://www.nlpir.org (自然语言处理与信息检索共享平台) http://www.bigdataBBS.com (大数据论坛) 微博:http://www.weibo.com/drkevinzhang/ 微信公众号:大数据千人会 GitHub:https://github.com/NLPIR-team/NLPIR

Dr. Kevin Zhang (张华平,Zhang Hua-Ping) Associate Professor, Graduate Supervisor Director, Big Data Search and Mining Lab. Beijing Engineering Research Center of Massive Language Information Processing and Cloud Computing Application Beijing Institute of Technology Add: No.5, South St.,Zhongguancun,Haidian District,Beijing,P.R.C PC:100081 Tel: +86-10-68918642 Email:[email protected] MSN: [email protected]; Website: http://www.nlpir.org (Natural Language Processing and Information Retrieval Sharing Platform) http://www.bigdataBBS.com (Big Data Forum) Twitter: http://www.weibo.com/drkevinzhang/ Subscriptions: Thousands of Big Data Experts GitHub:https://github.com/NLPIR-team/NLPIR

发件人: lichangwei 发送时间: 2018-07-09 10:59 收件人: NLPIR-team/NLPIR 抄送: Subscribed 主题: [NLPIR-team/NLPIR] 长文本分词会很慢,出现part of speech not recognized: 'gwz' (#143) ― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Dr-Kevin-Zhang avatar Jul 10 '18 04:07 Dr-Kevin-Zhang