TPlinker-joint-extraction
TPlinker-joint-extraction copied to clipboard
tplinker_plus.py 中的decode_rel有错误
感谢作者分享代码,在利用训练好该模型进行预标注的过程中,发现tplinker_plus.py 中的decode_rel有错误
head link for sp in matrix_spots: ........... # recover the positons in the original text for ent in ent_list: ent["char_span"] = [ent["char_span"][0] + char_offset, ent["char_span"][1] + char_offset] ent["tok_span"] = [ent["tok_span"][0] + tok_offset, ent["tok_span"][1] + tok_offset]
实体的span恢复,应该放在上述循环外,否则解码会出错,例如下
文本总长2001,输出实体的char_pan却出现了[2853, 2866]这种,,,
'relation_list': [{'subject': 'SAR444245', 'object': 'every 3 weeks', 'subj_tok_span': [405, 410], 'obj_tok_span': [418, 421], 'subj_char_span': [1165, 1174], 'obj_char_span': [1193, 1206], 'predicate': '/Drug/FREQUENCY/Drug-FREQUENCY'}],
'entity_list': [
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [663, 668], 'char_span': [2326, 2335]},
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [669, 676], 'char_span': [2340, 2353]},
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [705, 710], 'char_span': [2455, 2464]},
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [711, 718], 'char_span': [2469, 2482]},
{'type': 'FREQUENCY', 'text': 'every 3 weeks', 'tok_span': [718, 721], 'char_span': [2483, 2496]},
{'type': 'Drug', 'text': 'SAR444245', 'tok_span': [746, 751], 'char_span': [2583, 2592]},
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [752, 759], 'char_span': [2597, 2610]},
{'type': 'FREQUENCY', 'text': 'every 3 weeks', 'tok_span': [759, 762], 'char_span': [2611, 2624]},
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [793, 800], 'char_span': [2725, 2738]},
{'type': 'Drug', 'text': 'pembrolizumab', 'tok_span': [834, 841], 'char_span': [2853, 2866]}]}