hankcs
hankcs
Hi, maybe your system encoding is not correct. It works fine on my mac. ``` Python 3.6.4 (default, Mar 22 2018, 13:54:22) from convert_corpus import preprocess preprocess('充满') Out[3]: ['充满'] ```
Hi, the paper is currently under review. Hopefully we'll release the preprint version soon after acceptance.
Hi, before the acceptance please refer to Strubell's paper. Actually our paper is more like a naive application or experiment report. We didn't do much improvement, except for the subchar...
I think so. Need to use some versions close to 1.2.
https://github.com/hankcs/ID-CNN-CWS/issues/5
Hi, the character embeddings are from here: https://github.com/hankcs/sub-character-cws Yes, they're trained with radical features.
我是从源码编译安装的,第一步先安装mkl(可选,大幅提升CPU速度): http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/11544/l_mkl_2017.3.196.tgz 然后执行如下命令: ``` #!/usr/bin/env bash git clone https://github.com/clab/dynet.git hg clone https://bitbucket.org/eigen/eigen -r 346ecdb # -r NUM specified a known working revision cd dynet git checkout 2.0.1 mkdir build cd...
Dynet版本号不匹配,必须是2.0.1:https://github.com/clab/dynet/releases/tag/2.0.1
感谢使用,这是个振奋人心的结果。我的word-embedding(其实是char-embedding)考虑了汉字的偏旁部首等构字信息,然后利用fastText的General Continuous Skip-Gram (SG) Model训练。关于这种字向量的原理,欢迎参考https://arxiv.org/pdf/1712.08841.pdf
不好意思,很久的文章,模型没有保留下来。建议在服务器上安装MKL训练。