DeepCoNN google.bin 相关疑问

google.bin 相关疑问

Open burette opened this issue 4 years ago • 7 comments

前辈您好，您在代码里注释的 #tf.flags.DEFINE_string("word2vec", "./data/rt-polaritydata/google.bin", "Word2vec file with pre-trained embeddings (default: None)") 这个google.bin文件就是谷歌的GoogleNews-vectors-negative300.bin文件是么？

Dec 02 '19 09:12 burette

大佬，negative300.bin这个文件试过吗

Dec 06 '19 01:12 Aliang-CN

大佬，negative300.bin这个文件试过吗

这个文件试过了。用的就是GoogleNews-vectors-negative300.bin这个预训练的。原代码使用Python2.7，我使用的python3.5，按照原来代码读这个文件的地方，会出现错误，内存溢出。python3下使用下面的片段进行读取negative300.bin： for line in tqdm(range(vocab_size)): # word = [] # while True: # ch = f.read(1) # if ch == b' ': # # word = ''.join(word) # break # if ch != b'\n': # word.append(ch) word = b'' while True: ch = f.read(1) if ch == b' ': break word += ch 这个可以走通整个流程。

Dec 06 '19 03:12 burette

大佬有试过gensim读取bin文件吗

Dec 09 '19 01:12 Aliang-CN

你这种方法读取太慢了，要3个小时

Dec 09 '19 02:12 Aliang-CN

你这种方法读取太慢了，要3个小时

读取三个小时可能是机器性能问题？我这边几台机子都是几分钟读完i5的机子

Dec 09 '19 02:12 burette

大佬有试过gensim读取bin文件吗

from gensim.models.keyedvectors import KeyedVectors model = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin', binary=True, limit=300000)

Dec 09 '19 02:12 burette

我两种方法都试过了，我遍历vocabulary_user的词，发现在model里面都没有这个词，你那边是什么情况呢？

Dec 09 '19 03:12 Aliang-CN

DeepCoNN DeepCoNN copied to clipboard

google.bin 相关疑问

DeepCoNN
DeepCoNN copied to clipboard