biaffine-ner Running with fixed embeddings only

Hi and thanks for putting this code up! Is there a way to run the model with only fixed word embeddings, like glove, fasttext etc., but without bert?

Jun 11 '20 20:06 amir-zeldes

If you comment out line 197-207 in biaffine_ner_model.py and replace line 23 with self.lm_file=None it should work without Bert

Jun 11 '20 21:06 juntaoy

Thanks for the quick reply! This runs successfully now, but despite playing around with hyperparameters I can't get it to predict anything but the 'O' category for anything. To give some context, I'm trying to get baseline numbers for various neural systems for nested NER in a low resource setting for span detection. I'm not trying to get high numbers, but ideally non-zero :)

I have:

Fixed 50d word embeddings from a low resource language
~40K train tokens, 10K dev, 10K test, divided into a few documents (about 80 documents total)
~6K nested entity spans within these, in 10 categories, which I collapsed into just one non-'O' category

Loss is decreasing throughout training, but predictions on dev are always O. Oddly, even if I feed train as eval_path again, it still only predicts O in candidate_ner_scores in evaluate. Any ideas would be appreciated!

Here is my config (I tried making the model really small here to get a non-zero result, but I've played with various values):

test = ${base}{
  train_path = train.jsonlines
  lm_path = xyz
  eval_path = dev.jsonlines
  test_path = test.jsonlines
  ner_types = ["thing"]
  char_vocab_path = "char_vocab.txt"
  context_embeddings = ${w2v_50d}
  lm_size = 1
  lm_layers = 1
  flat_ner = false
  contextualization_size = 40
  contextualization_layers = 1

  eval_frequency = 150
  report_frequency = 50
  log_root = logs
  max_step = 8000
  
  lstm_dropout_rate = 0.2
  lexical_dropout_rate = 0.2
  dropout_rate = 0.2
  learning_rate = 0.001
  ffnn_size = 30
  ffnn_depth = 1
  char_embedding_size = 4

}

And here is some training output with decreasing loss, but 0 f-score on dev:

[50] loss=5979.44, steps/s=25.96
[100] loss=6474.12, steps/s=28.04
[150] loss=5528.80, steps/s=29.18
Loaded 13 eval examples.
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 28924.54 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[150] evaL_f1=0.00, max_f1=0.00 at step 0
[200] loss=4291.26, steps/s=27.66
[250] loss=4213.31, steps/s=28.87
[300] loss=4266.32, steps/s=29.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32010.57 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[300] evaL_f1=0.00, max_f1=0.00 at step 0
[350] loss=4609.43, steps/s=28.40
[400] loss=3626.02, steps/s=28.74
[450] loss=3158.52, steps/s=29.28
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33300.01 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[450] evaL_f1=0.00, max_f1=0.00 at step 0
[500] loss=3116.23, steps/s=28.55
[550] loss=2190.09, steps/s=29.40
[600] loss=3685.80, steps/s=28.79
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33556.00 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[600] evaL_f1=0.00, max_f1=0.00 at step 0
[650] loss=1716.10, steps/s=28.78
[700] loss=2771.52, steps/s=28.65
[750] loss=1920.65, steps/s=28.89
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32705.53 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[750] evaL_f1=0.00, max_f1=0.00 at step 0
[800] loss=1655.52, steps/s=28.60
[850] loss=1689.42, steps/s=28.73
[900] loss=1852.03, steps/s=28.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32728.62 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[900] evaL_f1=0.00, max_f1=0.00 at step 0
[950] loss=1338.44, steps/s=28.34
[1000] loss=1473.17, steps/s=28.32
[1050] loss=1163.35, steps/s=28.45
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 30699.79 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%

Jun 12 '20 21:06 amir-zeldes

Never mind, I figured it out - BTW I ported this to TF2.2 with tf.compat.v1 and Python 3.X, I can push it to a fork/PR if you're interested

Jun 16 '20 16:06 amir-zeldes

Hi, sorry for late reply:) may I ask how did you solve the all ‘O’ problem? I haven’t this problem in all my experiments. I assume it might because of the size of the corpus. Did you do undersampling?

For the updated code feel free to push them and if you could attach here the address to your repository, so people want to use them can find it easily:) thanks.

Jun 24 '20 09:06 juntaoy

OK, my changes are now in the PR in #9

You can also see the low resource parameters I used in the experiments.conf, I'm getting F1=0.757 for span detection (no entity type classification). I'm comparing it to using syntax tree based spans (predicting what should be a candidate for type classification using a normal dependency parser), which currently gets 82.3 for predicted POS tags and parses, and 87 for gold parses.

Jun 29 '20 15:06 amir-zeldes

Thanks a lot, Amir, I've include a link in the readme for people to find your tf2.0 ready code. For the span detection in the under-resourced case, you might want to use undersampling by masking out some of a large portion of negative during the training say give T (e.g. 5) negative examples per positive example. You can do this by simply add a new boolean placeholder (us_masks) same shape as gold_labels modify your code 135-145 as:

   us_ratio = config['under_sampling_ratio'] #can be calculated by T * num_positive_example/num_negative_example
   gold_labels = []
   us_masks = []
   for sid, sent in enumerate(sentences):
       ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
       for s in range(len(sent)):
         for e in range(s,len(sent)):
           label = ner.get((s,e),0)  if is_training else 0
           gold_labels.append(label)
           mask = (random.rand() < us_ratio if label == 0 else True) if is_traning else True
           us_masks.append(mask)
   us_masks = np.array(us_masks)  
   gold_labels = np.array(gold_labels)

    example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels,us_masks)

And before compute the loss in line 246:

candidate_ner_scores = tf.boolean_mask(candidate_ner_scores, us_masks)
gold_labels = tf.boolean_mask(gold_labels,us_masks)

I find this method very helpful when dealing with under-resourced cases, (I did for other task using similar architecture)

Jun 29 '20 16:06 juntaoy

biaffine-ner biaffine-ner copied to clipboard

Running with fixed embeddings only

biaffine-ner
biaffine-ner copied to clipboard