Batch_Parallel_LatticeLSTM icon indicating copy to clipboard operation
Batch_Parallel_LatticeLSTM copied to clipboard

Chinese NER using Lattice LSTM. Reproduction for ACL 2018 paper.

中文

English

支持批并行的LatticeLSTM

  • 原论文:https://arxiv.org/abs/1805.02023
  • 在batch=10时,计算速度已明显超过原版代码
  • 在main.py中添加三个embedding的文件路径以及对应数据集的路径即可运行(原文中用的embedding文件下载路径请见https://github.com/jiesutd/LatticeLSTM)
  • 此代码集合已加入fastNLP

运行环境:

  • python >= 3.7.3
  • fastNLP >= dev.0.5.0
  • pytorch >= 1.1.0
  • numpy >= 1.16.4
  • fitlog >= 0.2.0

支持的数据集:

未包含的数据集可以通过提供增加类似 load_data.py 中 load_ontonotes4ner 这个输出格式的函数来增加对其的支持

性能:

数据集 目前达到的F1分数(test) 原文中的F1分数(test)
Weibo 58.66(可能有误) 58.79
Resume 95.18 94.46
Ontonote 73.62 73.88

备注:Weibo数据集我用的是V2版本,也就是更新过的版本,根据杨杰博士Github上LatticeLSTM仓库里的某个issue,应该是一致的。

如有任何疑问请联系:


Batch Parallel LatticeLSTM

  • paper:https://arxiv.org/abs/1805.02023
  • when batch is 10,the computation efficiency exceeds that of original code
  • set the path of embeddings and corpus before you run main.py. You can get 3 embeddings in https://github.com/jiesutd/LatticeLSTM
  • this code set has been added to fastNLP

Environment:

  • python >= 3.7.3
  • fastNLP >= dev.0.5.0
  • pytorch >= 1.1.0
  • numpy >= 1.16.4
  • fitlog >= 0.2.0

Dataset:

  • Resume,downloaded from here
  • Ontonote
  • Weibo

to those unincluded dataset, you can write the interface function whose output form is like load_ontonotes4ner in load_data.py

Performance:

Dataset F1 of my code(test) F1 in paper(test)
Weibo 58.66(maybe wrong) 58.79
Resume 95.18 94.46
Ontonote 73.62 73.88

PS:The Weibo dataset I use is V2, namely revised version.

If any confusion, please contact: