Chinese-Word-Vectors
Chinese-Word-Vectors copied to clipboard
100+ Chinese Word Vectors 上百种预训练中文词向量
你好,很感谢你们做的工作,我想问一下,比如百度百科的word_char词向量,其中字表和词表是怎么构建的?字和词是分别单独训练的么?为什么字的个数有这么多,感觉基本覆盖了所有的常用的汉字。
i have download the 300d wordvec from the list you have shown,but have some confused about how to use it with the command "python ana_eval_dense.py -v -a CA8/morphological.txt",could you give...
說明文件中沒有提到是simplified chinese 還是 traditional chinese 還是兩者都有?
I use the toolkit to evaluate the vector, and I got the answer. However, I wonder if you can tell us what kind of value is the signal of the...
分词词库问题
你好。非常感谢作者提供的评估语料和词向量,有些词向量的评估得分远远超过自训练的词向量,所以就想拿这些词向量做一些语义相似性的计算应用。问题来了:CA8里的一些词,Hanlp默认的词库是不包含这些词的,想通过聚合去重来合并现有的词库,但是缺少词频和词性的信息。能不能通过云盘的方式,分享一下针对百度百科语料的词库?
您好, 我发现 Readme 里面 Mixed-large 的 PPMI 向量没有超链接。请问有这一部分的词向量吗?如果有的话,可以麻烦提供一下下载地址吗?非常感谢!
代码 model = KeyedVectors.load_word2vec_format('/xx/ppmi.baidubaike.word', binary=False, unicode_errors='ignore') 报错内容 File "xx/miniconda/envs/py39-tf29/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1980, in _word2vec_line_to_vector word, weights = parts[0], [datatype(x) for x in parts[1:]] File "xx/miniconda/envs/py39-tf29/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1980, in word, weights =...
如何加载模型
作者你好,当我用下面的代码尝试加载您的中文词向量模型 # 加载中英文词向量模型 ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) 结果显示下面报错,应该如何解决呢 Traceback (most recent call last): File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in load_word2vec_format return _load_word2vec_format(...