Information-Extraction-Chinese icon indicating copy to clipboard operation
Information-Extraction-Chinese copied to clipboard

关于源代码的问题

Open wuyunxiangwyx opened this issue 6 years ago • 1 comments

你好,这段时间研究了您的ner的源代码,发现了两个问题 1、在main.py中train()方法中 update_tag_scheme(train_sentences, FLAGS.tag_schema) update_tag_scheme(test_sentences, FLAGS.tag_schema) 源代码只对训练集与测试集进行了tag的更新,而没有对校验集更新tag,这会对选取最佳模型有影响吗?

2、另外,在data_utils.py中的iob2(tags)中, if len(split) != 2 or split[0] not in ['I', 'B']: return False 如果训练集是IOBES的形式,包含S-XXX, E-XXX标签(例如:E-LOC等),在这里这个函数会返回False, 导致loader.py中的update_tag_scheme(sentences, tag_scheme)方法引发异常 if not iob2(tags): s_str = '\n'.join(' '.join(w) for w in s) raise Exception('Sentences should be given in IOB format! ' + 'Please check sentence %i:\n%s' % (i, s_str))

本人能力有限,可能存在对源代码理解有误,忘您解答,谢谢!

wuyunxiangwyx avatar Nov 06 '18 09:11 wuyunxiangwyx

for i, tag in enumerate(tags): if tag == 'O': continue split = tag.split('-') if len(split) != 2 or split[0] not in ['I', 'B','E','S']: return False if split[0] == 'B': continue if split[0]=='E': continue if split[0]=='S': continue elif i == 0 or tags[i - 1] == 'O': # conversion IOB1 to IOB2 tags[i] = 'B' + tag[1:] elif tags[i - 1][1:] == tag[1:]: continue else: # conversion IOB1 to IOB2 tags[i] = 'B' + tag[1:] return True 我的训练集也是IOBES的格式,我就这么补上去了

jianminli55 avatar Nov 21 '18 06:11 jianminli55