bert-for-tf2
bert-for-tf2 copied to clipboard
Named-entity recognition
How would you approach named-entity recognition with this library?
I am working on a similar sequence tagging task for argument candidate identification. Essentially BERT or ALBERT would perform the encoding aspect of the raw input. Then, you would need a layer on top of BERT|ALBERT to decode the representations to the desired target.
I would essentially follow this example here: https://github.com/kpe/bert-for-tf2/blob/master/examples/gpu_movie_reviews.ipynb
Under create_model
, you would need to modify the layers after the BERT|ALBERT layer to map to your output sequence dimension. I will probably do this task in another repo and can post some results soon.
@kpe you mentioned in #30 to ignore the activations of the padding in the output layer, would you also suggest doing this for a sequence tagging task? If so, how would you propose doing this in the output layer?
Also, thank you for this awesome repo. Minor issue though: under NEWS on the readme, I think the first entry should be 6th Jan 2020. Just a minor thing, no biggie :)
Any update on NER tasks with this library?
If there is a NER example with this library, that will be very helpful!
Hi, As I managed to use this library for NER task i am happy to share my experiences. Sorry, but I can't share the whole code, but trying to explain the key parts.
- The input text is tokenized by the tokenizer module and padded to a specified max lenght (in my case 200 tokens at max)
- For each token the output tags are transformed into a one-hot vector and if the tokenizer broke up one word into multiple tokens then I used the belonging tag for the first token and [MASK] for the remaining part of the original word
- So I have X sentences in the trainign set, then the input shape is (X,200) hence 200 is the padded lenght of each sentences. In this case the output shape is (X,200,NUMBER_OF_TAGS). NUMBER_OF_TAGS is the number of your entity types, depends of whether you use BIOE, or just BIO, and here you add the special tokens: [CLS], [PAD], [MASK]. In my case here are the tags:
['B-ORG', 'I-ORG', 'B-MISC', 'I-MISC', 'B-LOC', 'I-LOC', 'B-PER', 'I-PER', 'O', '[CLS]', '[MASK]', '[PAD]'].
This way my shapes are (X,200) and (X,200,12) - load the Bert model the same way as in the calssification example but here we will use a different model architecture for the remaining layers, hence it is not just a classification. This is basically the example codes of the packages description with a little tweak:
bert_layer = bert_tf2.BertModelLayer.from_params(bert_params, name="bert")
input = tf.keras.layers.Input(shape=(200))
output = bert_layer(input)
output = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(units=12, activation='softmax'))(output)
model = tf.keras.models.Model(inputs=input, outputs=output)
model.build(input_shape=(200))
bert_layer.apply_adapter_freeze()
bert_layer.embeddings_layer.trainable=False
The magic here is the TimeDistributed wrapper layer. My results: After just 1 epoch on 29k trainign sentences: loss: 0.0227 - categorical_accuracy: 0.9933 - val_loss: 0.0042 - val_categorical_accuracy: 0.9988
So basically, that's it folks :)