MatchZoo icon indicating copy to clipboard operation
MatchZoo copied to clipboard

Loading word2vec embedding exceeds the memory limit

Open danielwonght opened this issue 5 years ago • 2 comments

Describe the bug

Loading word2vec embedding causes the memory issue. Loading embedding vector in string format require much more memory.

Solution

Modify the function matchzoo.embedding.load_from_file from:

data = pd.read_csv(file_path, sep=" ", index_col=0, header=None, skiprows=1)

to:

data = pd.read_csv(file_path, sep=" ", index_col=0, header=None, skiprows=1, quoting=csv.QUOTE_NONE)

danielwonght avatar Nov 26 '19 19:11 danielwonght

would you like to send a PR to fix this issue? @danielwonght

bwanglzu avatar Nov 27 '19 14:11 bwanglzu

Sure.

danielwonght avatar Nov 28 '19 09:11 danielwonght