Ruanchong issues

Results 2 issues of


                                            Ruanchong

model performance and training speed

Hi, anantzoid, Could you please provide some more information about convergence of the model? e.g.: How long will the training procedure take? What device do you use? How about the...

几条建议

如果目标是要打造工业级强度的分词工具，那么： 1、按照 PEP8 规范把代码整理一下； 2、不建议支持 Python 2，Python 2 都要淘汰了，这个精力花得不值； 3、模型文件在 GitHub （可以参考 [distributing-large-binaries](https://help.github.com/articles/distributing-large-binaries/)）或者 s3 上放一份； 4、加载模型的时候给出提示（输出相应的日志，而非直接打印到控制台），这样用户可以知道模型什么时候加载完毕，否则会误以为分词本身用了很长时间； 5、和其他分词工具（如 hanlp, LTP 等）进行更全面的对比，以及需要增加关于性能的基准测试（比如每秒能处理多少词）； 6、对标点符号、数字等的特殊处理； 7、增加 C++/Java 接口（假如仅仅是做推断的话，其实更建议 CRF 的部分用 C++ 重写）