biaffine-ner
biaffine-ner copied to clipboard
Running with fixed embeddings only
Hi and thanks for putting this code up! Is there a way to run the model with only fixed word embeddings, like glove, fasttext etc., but without bert?
If you comment out line 197-207 in biaffine_ner_model.py and replace line 23 with self.lm_file=None it should work without Bert
Thanks for the quick reply! This runs successfully now, but despite playing around with hyperparameters I can't get it to predict anything but the 'O' category for anything. To give some context, I'm trying to get baseline numbers for various neural systems for nested NER in a low resource setting for span detection. I'm not trying to get high numbers, but ideally non-zero :)
I have:
- Fixed 50d word embeddings from a low resource language
- ~40K train tokens, 10K dev, 10K test, divided into a few documents (about 80 documents total)
- ~6K nested entity spans within these, in 10 categories, which I collapsed into just one non-'O' category
Loss is decreasing throughout training, but predictions on dev are always O. Oddly, even if I feed train as eval_path again, it still only predicts O in candidate_ner_scores
in evaluate. Any ideas would be appreciated!
Here is my config (I tried making the model really small here to get a non-zero result, but I've played with various values):
test = ${base}{
train_path = train.jsonlines
lm_path = xyz
eval_path = dev.jsonlines
test_path = test.jsonlines
ner_types = ["thing"]
char_vocab_path = "char_vocab.txt"
context_embeddings = ${w2v_50d}
lm_size = 1
lm_layers = 1
flat_ner = false
contextualization_size = 40
contextualization_layers = 1
eval_frequency = 150
report_frequency = 50
log_root = logs
max_step = 8000
lstm_dropout_rate = 0.2
lexical_dropout_rate = 0.2
dropout_rate = 0.2
learning_rate = 0.001
ffnn_size = 30
ffnn_depth = 1
char_embedding_size = 4
}
And here is some training output with decreasing loss, but 0 f-score on dev:
[50] loss=5979.44, steps/s=25.96
[100] loss=6474.12, steps/s=28.04
[150] loss=5528.80, steps/s=29.18
Loaded 13 eval examples.
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 28924.54 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[150] evaL_f1=0.00, max_f1=0.00 at step 0
[200] loss=4291.26, steps/s=27.66
[250] loss=4213.31, steps/s=28.87
[300] loss=4266.32, steps/s=29.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32010.57 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[300] evaL_f1=0.00, max_f1=0.00 at step 0
[350] loss=4609.43, steps/s=28.40
[400] loss=3626.02, steps/s=28.74
[450] loss=3158.52, steps/s=29.28
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33300.01 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[450] evaL_f1=0.00, max_f1=0.00 at step 0
[500] loss=3116.23, steps/s=28.55
[550] loss=2190.09, steps/s=29.40
[600] loss=3685.80, steps/s=28.79
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33556.00 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[600] evaL_f1=0.00, max_f1=0.00 at step 0
[650] loss=1716.10, steps/s=28.78
[700] loss=2771.52, steps/s=28.65
[750] loss=1920.65, steps/s=28.89
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32705.53 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[750] evaL_f1=0.00, max_f1=0.00 at step 0
[800] loss=1655.52, steps/s=28.60
[850] loss=1689.42, steps/s=28.73
[900] loss=1852.03, steps/s=28.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32728.62 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[900] evaL_f1=0.00, max_f1=0.00 at step 0
[950] loss=1338.44, steps/s=28.34
[1000] loss=1473.17, steps/s=28.32
[1050] loss=1163.35, steps/s=28.45
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 30699.79 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
Never mind, I figured it out - BTW I ported this to TF2.2 with tf.compat.v1 and Python 3.X, I can push it to a fork/PR if you're interested
Hi, sorry for late reply:) may I ask how did you solve the all ‘O’ problem? I haven’t this problem in all my experiments. I assume it might because of the size of the corpus. Did you do undersampling?
For the updated code feel free to push them and if you could attach here the address to your repository, so people want to use them can find it easily:) thanks.
OK, my changes are now in the PR in #9
You can also see the low resource parameters I used in the experiments.conf, I'm getting F1=0.757 for span detection (no entity type classification). I'm comparing it to using syntax tree based spans (predicting what should be a candidate for type classification using a normal dependency parser), which currently gets 82.3 for predicted POS tags and parses, and 87 for gold parses.
Thanks a lot, Amir, I've include a link in the readme for people to find your tf2.0 ready code. For the span detection in the under-resourced case, you might want to use undersampling by masking out some of a large portion of negative during the training say give T (e.g. 5) negative examples per positive example. You can do this by simply add a new boolean placeholder (us_masks) same shape as gold_labels modify your code 135-145 as:
us_ratio = config['under_sampling_ratio'] #can be calculated by T * num_positive_example/num_negative_example
gold_labels = []
us_masks = []
for sid, sent in enumerate(sentences):
ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
for s in range(len(sent)):
for e in range(s,len(sent)):
label = ner.get((s,e),0) if is_training else 0
gold_labels.append(label)
mask = (random.rand() < us_ratio if label == 0 else True) if is_traning else True
us_masks.append(mask)
us_masks = np.array(us_masks)
gold_labels = np.array(gold_labels)
example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels,us_masks)
And before compute the loss in line 246:
candidate_ner_scores = tf.boolean_mask(candidate_ner_scores, us_masks)
gold_labels = tf.boolean_mask(gold_labels,us_masks)
I find this method very helpful when dealing with under-resourced cases, (I did for other task using similar architecture)