emnlp2017-bilstm-cnn-crf icon indicating copy to clipboard operation
emnlp2017-bilstm-cnn-crf copied to clipboard

embeddings = np.array(embeddings) MemoryError

Open ghost opened this issue 6 years ago • 2 comments

Hi,

I get the memory error when I run NER model by using word2vec embeddings from this link (http://evexdb.org/pmresources/vec-space-models/). But I am able to run Elmo-Bilstm model with these embeddings without getting any error. Is there any way to fix this issue? My embeddings file is 13.2 GB whereas I have 16 GB of RAM.

ghost avatar Jan 28 '19 12:01 ghost

13.2 GB for an embedding file is extremely large. Are you sure you all need these embeddings?

You often get really good performances with much smaller embedding files, e.g. with Komninos embeddings: https://public.ukp.informatik.tu-darmstadt.de/reimers/embeddings/

Or with the GloVe embeddings.

Some embeddings contain many unneccessary embeddings. The original word2vec embeddings for example also contain embeddings for bigrams (which cannot be used in this architecture). The Komninos embeddings you get from his webpage also contain embedding information for dependency relations (which also cannot be used with this architecture).

If you sill want to use your linked embeddings:

The perpareDataset method in util.py has an argument: reducePretrainedEmbeddings=False

Set this argument to True.

With this argument, only the needed embeddings are loaded from disk and stored in memory. Further word embeddings, that do not appear in train/dev/test, are not loaded.

nreimers avatar Jan 28 '19 12:01 nreimers

I was able to run the code by setting this argument "reducePretrainedEmbedding=True"

But I am wondering, why I was able to to do NER with ELMo and word2vec embeddings(13.2 GB file) without setting that argument to true. Can you help me understand that?

ghost avatar Jan 29 '19 11:01 ghost