ME-CNER
ME-CNER copied to clipboard
Code for CIKM 2019 paper "Exploiting Multiple Embeddings for Chinese Named Entity Recognition".
ME-CNER
Code for CIKM 2019 paper "Exploiting Multiple Embeddings for Chinese Named Entity Recognition".
Citation
If you use this code in your work, please kindly cite our work:
@inproceedings{cikm19:xu,
author = {Canwen Xu and
Feiyang Wang and
Jialong Han and
Chenliang Li},
title = {Exploiting Multiple Embeddings for Chinese Named Entity Recognition},
booktitle = {The 28th ACM International Conference on Information and Knowledge Management, {CIKM} 2019, Beijing, China,
November 3-7, 2019},
publisher = {{ACM}},
year = {2019},
url = {https://doi.org/10.1145/3357384.3358117},
doi = {10.1145/3357384.3358117}
}
Requirement
Python: 3.6
Keras: 2.2.2
Keras-contrib: 2.0.8
jieba: 0.39
Dataset
We use a standard Weibo NER dataset provided by Peng and Dredze, 2015, and a formal MSRA News dataset provided by Levow, 2006.
Pretrained Embeddings
The pretrained character and word embeddings are provided by Tencent AI Lab. Download it here.
The radical embedding is randomly initialized.
How to Run
- Install all requirements
pip install keras==2.2.2 # for Keras
pip install git+https://www.github.com/keras-team/keras-contrib.git # for CRF layer
pip install jieba # for word segmentation
-
Download pretrained embeddings Download Tencent Embeddings, extract it and put it in
process_data/data_preprocess
. -
Run the pre-processing code
python concat_data.py
- Run the model (with different config)
python main.py --dataset ${weibo/msra} --with_radical ${1/0} --network ${convgru/cnn/bilstm}
--tagger ${bigrucrf/bilstmcrf} --entity_type ${all/nm/ne}
dataset:
weibo
msra
with_radical: # input radical embedding or not
0 # no radical embedding input, only word embedding and char embedding
1 # with radical embedding
network: # for characters
convgru # Conv-GRU
bilstm
cnn
tagger:
bigrucrf # Bidirectional GRU-CRF
bilstmcrf # Bidirectional LSTM-CRF
entity_type:
ne # only Named Entity. e.g. 王小明 (Xiaoming Wang), 北京市 (Beijing City)
nm # only Nominal Mention. e.g. 班长 (class president), 妈妈 (mother)
all # take both Named Entity and Nominal Mention into accounts
For example, run the following shell to run our final ME-CNER model on WEIBO dataset, but only recognize named entities (all nominal mentions are ignored).
python main.py --dataset weibo --with_radical 1 --network convgru --tagger bigrucrf --entity_type ne