BERT-NER
BERT-NER copied to clipboard
Pytorch-Named-Entity-Recognition-with-BERT
Hi, As BERT tokenization only supports tokenization of sentence upto 512 so if my text length is greater than 512 How can I proceed? I used BertForTokenClassification for entity recognition...
#bert.py#49l def tokenize(self, text: str): """ tokenize input""" words = word_tokenize(text) tokens = [] valid_positions = [] **for i,word in enumerate(words):** token = self.tokenizer.tokenize(word) tokens.extend(token) **for i in range(len(token)):** **if...
I would love to use train this on the bert-large corpus and then fine-tune it on the BC5CDR-chem corpus for NER and then use it to predict on unlabelled raw...
Hi, Thanks for sharing code, I just want to talk about the "num_train_epochs". How many epochs are enough for ner task
CoNll dataset - can predict only 5 tags ontonotes- can predict around 18 tags will the code work fine if i just replace the dataset with ontonotes or are there...
its not an floating point issue between devices and then only the certainty changes **on GPU i get (4/18) on CPU (16/18)** in order to hard code GPU and then...
The supported sequence length of BERT is up to 512 tokens. Adding a simple sentence tokenization to API would enable users to process longer texts.
In the function `convert_examples_to_features`, each word may be split into >1 word by the BERT tokenizer ``` token = tokenizer.tokenize(word) tokens.extend(token) ``` but the length of labels remains the same....
Hello, thank you for your great work! The F1 score can reach a high level mentioned in this repo by the experiment branch. However, when I tried to train the...