deep-learning-from-scratch-2
deep-learning-from-scratch-2 copied to clipboard
How to make training Simple Skip-gram model code? I try to fix your code. Plz see this issue!
I want to realize the code that simple Skip-gram model fit training data. So I try to it, unlike CBOW training code, I encountered the error.
At first, I used this trainer.py, simple_skip_gram.py and I run the below training code like train.py(this train.py can be run by SImple CBOW model)
# Skip-gram으로 학습시켜보기
import numpy as np
from common.util import preprocess, create_contexts_target, convert_one_hot
from common.optimizer import Adam
from common.trainer import Trainer
from simple_skipgram import SimpleSkipGram
# 1. 말뭉치 전처리
window_size = 1
text = 'You say goodbye and I say Hello.'
corpus, word_to_id, id_to_word = preprocess(text)
contexts, target = create_contexts_target(corpus, window_size)
vocab_size = len(word_to_id)
contexts_ohe = convert_one_hot(contexts, vocab_size)
target_ohe = convert_one_hot(target, vocab_size)
# 2. 하이퍼파라미터 설정
hidden_size = 5
batch_size = 3
epochs = 1000
# 3. Skip-gram 모델
model = SimpleSkipGram(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)
# 4. 학습
trainer.fit(x=target_ohe,
t=contexts_ohe, max_epochs=epochs, batch_size=batch_size)
but this code occur this error

So, I find this error because of Matmul class of layer.py and I fix it from origin Matmul class of layer.py like below.
class Matmul:
def __init__(self, W):
self.params = [W]
self.grads = [np.zeros_like(W)]
self.x = None
def forward(self, x):
W, = self.params
out = np.matmul(x, W)
self.x = x
return out
def backward(self, dout):
W, = self.params
# I append below if statements
if dout.ndim == 3:
dout = np.sum(dout, axis=1)
if self.x.ndim == 3:
self.x = np.sum(self.x, axis=1)
dx = np.matmul(dout, W.T)
dW = np.matmul(self.x.T, dout)
self.grads[0][...] = dW
return dx
After running this fixed code, I succeed in Skip-gram training code. but contrast with SimpleCBOW, the loss value is higher and can't decrease it. I wanna check that my code is right.. Under this circumstance, the reason why my simple Skip-gram model has high loss is just very small corpus..?
If my fixed code is unright, how to revise original code? Please reply to me. I am learning much from your book. Thanks!
vocab_size = len(word_to_id)
contexts, target = create_contexts_target(corpus, window_size)
target = convert_one_hot(target, vocab_size)
contexts = convert_one_hot(contexts, vocab_size)
model = SimpleSkipGram(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)
trainer.fit(contexts, target, max_epoch, batch_size)
trainer.plot()
This code will work.