Word Embedding - Hebrew

Download hebrew dataset from wikipedia
- Go to: https://dumps.wikimedia.org/hewiki/latest/
- Download hewiki-latest-pages-articles.xml.bz2
In linux this can be easily done using:

wget https://dumps.wikimedia.org/hewiki/latest/hewiki-latest-pages-articles.xml.bz2
pip install --upgrade gensim (https://radimrehurek.com/gensim/install.html)
Run create_corpus.py: python create_corpus.py
- It will create wiki.he.text
train the model: from python prompt:
- import word2vec
- word2vec.train()
explore model using jupyter notebook. You can use the supplied playingWithHebModel.ipynb example as a starting point.

pip install fasttext

Train (inp = "wiki.he.text", out_model = "wiki.he.fasttext.model", alg = "skipgram")

Testing specific Hebrew analogies like:

פריז + גרמניה - צרפת = ברלין

גבר + מלכה - מלך = אישה

wordembedding-hebrew
wordembedding-hebrew copied to clipboard