korean-ner-pytorch
korean-ner-pytorch copied to clipboard
NER Task with CNN + BiLSTM + CRF (with Naver NLP Challenge dataset) with Pytorch
Korean NER with Pytorch
Korean NER Task with CharCNN + BiLSTM + CRF (with Naver NLP Challenge dataset), implemented with Pytorch
Model
- Character Embedding with
CNN - Concatenate
word embeddingwithcharacter represention - Put the feature above to
BiLSTM + CRF
Dependencies
- python>=3.5
- torch==1.4.0
- seqeval==0.0.12
- pytorch-crf==0.7.2
- gdown==3.10.1
Data
| Train | Test | |
|---|---|---|
| # of Data | 81,000 | 9,000 |
- Naver NLP Challenge 2018 NER Dataset (Github link)
- Original github only has train dataset, so test dataset is created by splitting the train dataset. (Data link)
Pretrained Word Vectors
- Use Korean fastText vectors with 300 dimension
- It takes quiet long time to load from original vector, so I take out the word vectors that are only in word vocab.
- It will be downloaded automatically when you run
main.py.
Usage
$ python3 main.py --do_train --do_eval
- Evaluation prediction result will be saved in
predsdir when you give--write_predoption.
Results
| Slot F1 (%) | |
|---|---|
| CNN+BiLSTM+CRF | 73.65 |
| CNN+BiLSTM+CRF (+fastText) | 74.57 |