nlp-zero 运行WORD_FINDER能够增量训练even if after called "find" method

运行WORD_FINDER能够增量训练even if after called "find" method

Open luoy2 opened this issue 4 years ago • 0 comments

Word_Finder类里面的words 是一个defaultdict(int)，但是在调用find() 方法的时候会把这个words的属性直接写成dict。同样， remove_weak_pairs 也会把本来是defaultdict(int)类型的改成dict。这样会导致训练好的Word_Finder类里面的words没有办法继续训练, 具体复现如下:

from nlp_zero import Word_Finder
... import logging
... import pandas as pd
... logging.basicConfig(level = logging.INFO, format = '%(asctime)s - %(name)s - %(message)s')
... test_sents = ['陆陆续续写了几篇最小熵原理的博客，致力于无监督做NLP的一些基础工作。',
...               '为了方便大家实验，把文章中涉及到的一些算法封装为一个库，供有需要的读者测试使用。',
...               '由于面向的是无监督NLP场景，而且基本都是NLP任务的基础工作，因此命名为nlp zero。']
... class D:
...     def __iter__(self):
...         for l in test_sents:
...             yield l.strip() # python2.x还需要转编码
... 
... f = Word_Finder(min_proba=1e-8)
... f.train(D()) # 统计互信息
... f.find(D()) # 构建词库
... f.train(D()) # 统计互信息
... 
2020-05-20 12:46:25,923 - 统计频数 - 共统计了 3 个句子
2020-05-20 12:46:25,924 - 词库构建 - 共处理了 3 个句子
Traceback (most recent call last):
  File "<input>", line 16, in <module>
  File "C:\ProgramData\Anaconda3\envs\tf2_gpu\lib\site-packages\nlp_zero\nlp_zero.py", line 311, in train
    self.pairs[s[i:i + 2]] += 1
KeyError: '陆陆'

更改后只会update 字典值，并不会更改字典类型

May 20 '20 05:05 luoy2

nlp-zero nlp-zero copied to clipboard

运行WORD_FINDER能够增量训练even if after called "find" method

nlp-zero
nlp-zero copied to clipboard