Information-Extraction-Chinese 关于源代码的问题

关于源代码的问题

Open wuyunxiangwyx opened this issue 6 years ago • 1 comments

你好，这段时间研究了您的ner的源代码，发现了两个问题 1、在main.py中train()方法中 update_tag_scheme(train_sentences, FLAGS.tag_schema) update_tag_scheme(test_sentences, FLAGS.tag_schema) 源代码只对训练集与测试集进行了tag的更新，而没有对校验集更新tag，这会对选取最佳模型有影响吗？

2、另外，在data_utils.py中的iob2(tags)中， if len(split) != 2 or split[0] not in ['I', 'B']: return False 如果训练集是IOBES的形式，包含S-XXX, E-XXX标签（例如：E-LOC等），在这里这个函数会返回False，导致loader.py中的update_tag_scheme(sentences, tag_scheme)方法引发异常 if not iob2(tags): s_str = '\n'.join(' '.join(w) for w in s) raise Exception('Sentences should be given in IOB format! ' + 'Please check sentence %i:\n%s' % (i, s_str))

本人能力有限，可能存在对源代码理解有误，忘您解答，谢谢！

Nov 06 '18 09:11 wuyunxiangwyx

for i, tag in enumerate(tags): if tag == 'O': continue split = tag.split('-') if len(split) != 2 or split[0] not in ['I', 'B','E','S']: return False if split[0] == 'B': continue if split[0]=='E': continue if split[0]=='S': continue elif i == 0 or tags[i - 1] == 'O': # conversion IOB1 to IOB2 tags[i] = 'B' + tag[1:] elif tags[i - 1][1:] == tag[1:]: continue else: # conversion IOB1 to IOB2 tags[i] = 'B' + tag[1:] return True 我的训练集也是IOBES的格式，我就这么补上去了

Nov 21 '18 06:11 jianminli55

Information-Extraction-Chinese Information-Extraction-Chinese copied to clipboard

关于源代码的问题

Information-Extraction-Chinese
Information-Extraction-Chinese copied to clipboard