Chinese-Word-Vectors issues

字表和词表的覆盖范围

3

你好，很感谢你们做的工作，我想问一下，比如百度百科的word_char词向量，其中字表和词表是怎么构建的？字和词是分别单独训练的么？为什么字的个数有这么多，感觉基本覆盖了所有的常用的汉字。

eval question

4

i have download the 300d wordvec from the list you have shown,but have some confused about how to use it with the command "python ana_eval_dense.py -v -a CA8/morphological.txt",could you give...

flyboyer

請問有支援繁體中文嗎？

1

說明文件中沒有提到是simplified chinese 還是 traditional chinese 還是兩者都有？

allenyllee

How to define a basic line of "good word2vec"

2

I use the toolkit to evaluate the vector, and I got the answer. However, I wonder if you can tell us what kind of value is the signal of the...

zhouxincheng

how to get the corpus like 'Financial News 金融新闻'?

1

SeekPoint

分词词库问题

1

你好。非常感谢作者提供的评估语料和词向量，有些词向量的评估得分远远超过自训练的词向量，所以就想拿这些词向量做一些语义相似性的计算应用。问题来了：CA8里的一些词，Hanlp默认的词库是不包含这些词的，想通过聚合去重来合并现有的词库，但是缺少词频和词性的信息。能不能通过云盘的方式，分享一下针对百度百科语料的词库？

godfatherzzx

请问 Mixed-large 的 PPMI 向量的下载地址

1

您好，我发现 Readme 里面 Mixed-large 的 PPMI 向量没有超链接。请问有这一部分的词向量吗？如果有的话，可以麻烦提供一下下载地址吗？非常感谢！

rankiejiang

下游任务的词表中有些在你们的词向量文件中有未出现，请问有什么好的处理方式吗

4

zhengqingwu

ValueError: could not convert string to float: '9:0.0221277913146'

1

代码 model = KeyedVectors.load_word2vec_format('/xx/ppmi.baidubaike.word', binary=False, unicode_errors='ignore') 报错内容 File "xx/miniconda/envs/py39-tf29/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1980, in _word2vec_line_to_vector word, weights = parts[0], [datatype(x) for x in parts[1:]] File "xx/miniconda/envs/py39-tf29/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1980, in word, weights =...

Mp5A5

如何加载模型

5

作者你好，当我用下面的代码尝试加载您的中文词向量模型 # 加载中英文词向量模型 ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) 结果显示下面报错，应该如何解决呢 Traceback (most recent call last): File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in load_word2vec_format return _load_word2vec_format(...

YiingWei

Chinese-Word-Vectors
Chinese-Word-Vectors copied to clipboard

Metadata

字表和词表的覆盖范围

eval question

請問有支援繁體中文嗎？

How to define a basic line of "good word2vec"

how to get the corpus like 'Financial News 金融新闻'?

分词词库问题

请问 Mixed-large 的 PPMI 向量的下载地址

下游任务的词表中有些在你们的词向量文件中有未出现，请问有什么好的处理方式吗

ValueError: could not convert string to float: '9:0.0221277913146'

如何加载模型

← Metadata

Owner

Metadata

Chinese-Word-Vectors Chinese-Word-Vectors copied to clipboard

Metadata

← Metadata

Owner

Metadata

Chinese-Word-Vectors
Chinese-Word-Vectors copied to clipboard