bert_ner
bert_ner copied to clipboard
F1, recall and precision calculation
Hi, I was wondering how you are actually calculating your scores.
y_true = np.array([hp.tag2idx[line.split()[1]] for line in open(f, 'r').read().splitlines() if len(line) > 0]) y_pred = np.array([hp.tag2idx[line.split()[2]] for line in open(f, 'r').read().splitlines() if len(line) > 0])
num_proposed = len(y_pred[y_pred>1])
num_correct = (np.logical_and(y_true==y_pred, y_true>1)).astype(np.int).sum()
num_gold = len(y_true[y_true>1])
precision = num_correct / num_proposed
recall = num_correct / num_gold
Can you explain what the above code means? How does this translate to say recall = TP / TP + FN? Don't you have to use some multi-class method?
Also, why are you only taking the index where y_true>1? Is it because you do not want the Other tag to skew your results? Thanks!
I think this is a kind of raw since it counts event kind of tag (including I-XXXX) into the computation.
Using the standard evaluation tool is more preferable, such as https://github.com/sighsmile/conlleval/blob/master/conlleval.py.