BERT-NER
BERT-NER copied to clipboard
here "X" used to represent "##eer","##soo" and so on!
what is "##eer","##soo"
If the word does not exists in vocabulary it gets break down into several small words that exists in vocabulary of BERT by the tokenizer. For example, let say we have word 'Cats' in data and in Bert's vocabulary 'Cat' and '##s' exists but 'Cats' doesn't therefore word tokenizer will break 'Cats' to ['Cat', '##s']. This is how bert's handle out-of-vocabulary words. In this implementation of BERT-NER, all the (i.e '##s') sub words are assigned a label 'X'.
Hey, i want to know the data set.The first raw is word ,the fourth raw is the label, what's the second and third raw meaning? Another question is the output label_test.txt , its second and third raw are same, does it have another meaning ?
In train.txt, dev.txt or test.txt have following type of rows: AT NNP B-NP O TOP NNP I-NP O
In these files second column indicates part-of-speech tags (e.g., 'JJ', 'NNP'), and third column chunk labels. Both of these columns are ignored when training the model so you can simply put anything in these columns.
In labels_test.txt, second column is expected label and third column is predicted label.
OK,I got it.How long will it take me to finish this script with a gpu?
I haven't tried it on the dataset that is included in the repository so I can't tell.
OK. I run my own data. But i have some problem show in the picture:
the left is author's data ,the right is mine
It seems like code is unable to read your data. Does your train.txt file contains samples? Or can you paste some examples data here? My training files looks like:
I - - O am - - O with - - O : - - O exy- - person . - - O
This - - O : - - O abc - - person . - - O
Also, if you have entities other then the specified at line https://github.com/kyzhouhzau/BERT-NER/blob/master/BERT_NER.py#L227. Then you need to update this function
@FallakAsad Thank you.I have solved this problem.
@FallakAsad Now I want to change crf layer to lstm-crf layer? I dont know how to modify the code, can you give me some advice?