WordSegment icon indicating copy to clipboard operation
WordSegment copied to clipboard

Chinese WordSegment based on algorithms including Maxmatch (forward, backward, bidirectional), HMM,N-gramm(max prob ngram, biward ngam) etc...中文分词算法的实现,包括最大向前匹配、最大向后匹配,最大...

Results 4 WordSegment issues
Sort by recently updated
recently updated
newest added

``` #计算转移概率 Trans_dict = self.load_model(word_trans_path) for pre_word, post_info in Trans_dict.items(): for post_word, count in post_info.items(): word_pair = pre_word + ' ' + post_word self.trans_dict_count[word_pair] = float(count) if pre_word in self.word_dict_count.keys():...

在106行的样子,有一个 if char_list[i] not in emit_dict[line_status[i]]: #若当前词未出现在观测概率矩阵中,就将其加进来 emit_dict[line_status[i]][char_list[i]] = 0.0 #emit_dict[line_status[i]][char_list[i]] = 1.0——>已经由状态line_status[i]得到了观测char_list[i],所以应该初始化为1.0才对 else: emit_dict[line_status[i]][char_list[i]] += 1 # 用于计算发射概率 我觉得应该初始化为1.0才对吧

https://github.com/liuhuanyong/WordSegment/blob/b0486271215a08f4cf859689ecc10974cde799d6/max_ngram.py#L49