BERT-NER issues

How can we use model BertForTokenClassification for lengthy sentences?

4

Hi, As BERT tokenization only supports tokenization of sentence upto 512 so if my text length is greater than 512 How can I proceed? I used BertForTokenClassification for entity recognition...

Swty13

small error in the source code

1

#bert.py#49l def tokenize(self, text: str): """ tokenize input""" words = word_tokenize(text) tokens = [] valid_positions = [] **for i,word in enumerate(words):** token = self.tokenizer.tokenize(word) tokens.extend(token) **for i in range(len(token)):** **if...

wont

About fine-tuning

1

I would love to use train this on the bert-large corpus and then fine-tune it on the BC5CDR-chem corpus for NER and then use it to predict on unlabelled raw...

darwonsamal

Can this model be trained on bert large?

4

Umang9427

how many epochs do ner need

2

Hi， Thanks for sharing code, I just want to talk about the "num_train_epochs". How many epochs are enough for ner task

HongyanJiao

is this using CoNll dataset for training NER tag ? dose the code support ontonotes as well?

6

CoNll dataset - can predict only 5 tags ontonotes- can predict around 18 tags will the code work fine if i just replace the dataset with ontonotes or are there...

AjitAntony

inconsistency between GPU/CPU inference

2

its not an floating point issue between devices and then only the certainty changes **on GPU i get (4/18) on CPU (16/18)** in order to hard code GPU and then...

ntedgi

Add sentence tokenization to process longer texts.

1

The supported sequence length of BERT is up to 512 tokens. Adding a simple sentence tokenization to API would enable users to process longer texts.

askonivala

Difference in unpadded length of `input_ids` and `label_ids`

In the function `convert_examples_to_features`, each word may be split into >1 word by the BERT tokenizer ``` token = tokenizer.tokenize(word) tokens.extend(token) ``` but the length of labels remains the same....

vardaan123

Seems that F1 score of self trained model based on bert-base-uncased is unresonable

2

Hello, thank you for your great work! The F1 score can reach a high level mentioned in this repo by the experiment branch. However, when I tried to train the...

yangheng95

BERT-NER
BERT-NER copied to clipboard

Metadata

How can we use model BertForTokenClassification for lengthy sentences?

small error in the source code

About fine-tuning

Can this model be trained on bert large?

how many epochs do ner need

is this using CoNll dataset for training NER tag ? dose the code support ontonotes as well?

inconsistency between GPU/CPU inference

Add sentence tokenization to process longer texts.

Difference in unpadded length of `input_ids` and `label_ids`

Seems that F1 score of self trained model based on bert-base-uncased is unresonable

← Metadata

Owner

Metadata

BERT-NER BERT-NER copied to clipboard

Metadata

← Metadata

Owner

Metadata

BERT-NER
BERT-NER copied to clipboard