deep-learning-from-scratch-2 icon indicating copy to clipboard operation
deep-learning-from-scratch-2 copied to clipboard

How to make training Simple Skip-gram model code? I try to fix your code. Plz see this issue!

Open iamcodingcat opened this issue 3 years ago • 1 comments

I want to realize the code that simple Skip-gram model fit training data. So I try to it, unlike CBOW training code, I encountered the error. At first, I used this trainer.py, simple_skip_gram.py and I run the below training code like train.py(this train.py can be run by SImple CBOW model)

# Skip-gram으로 학습시켜보기
import numpy as np
from common.util import preprocess, create_contexts_target, convert_one_hot
from common.optimizer import Adam
from common.trainer import Trainer
from simple_skipgram import SimpleSkipGram

# 1. 말뭉치 전처리
window_size = 1

text = 'You say goodbye and I say Hello.'
corpus, word_to_id, id_to_word = preprocess(text)
contexts, target = create_contexts_target(corpus, window_size)

vocab_size = len(word_to_id)
contexts_ohe = convert_one_hot(contexts, vocab_size)
target_ohe = convert_one_hot(target, vocab_size)

# 2. 하이퍼파라미터 설정
hidden_size = 5
batch_size = 3
epochs = 1000

# 3. Skip-gram 모델
model = SimpleSkipGram(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

# 4. 학습
trainer.fit(x=target_ohe, 
            t=contexts_ohe, max_epochs=epochs, batch_size=batch_size)

but this code occur this error 스크린샷 2021-12-05 오후 2 50 04

So, I find this error because of Matmul class of layer.py and I fix it from origin Matmul class of layer.py like below.

class Matmul:
    def __init__(self, W):
        self.params = [W]
        self.grads = [np.zeros_like(W)]
        self.x = None
        
    def forward(self, x):
        W, = self.params 
        out = np.matmul(x, W)
        self.x = x
        return out
    
    def backward(self, dout):
        W, = self.params
        # I append below if statements
        if dout.ndim == 3:
            dout = np.sum(dout, axis=1)
        if self.x.ndim == 3:
            self.x = np.sum(self.x, axis=1)
        dx = np.matmul(dout, W.T)
        dW = np.matmul(self.x.T, dout)
        self.grads[0][...] = dW
        return dx

After running this fixed code, I succeed in Skip-gram training code. but contrast with SimpleCBOW, the loss value is higher and can't decrease it. I wanna check that my code is right.. Under this circumstance, the reason why my simple Skip-gram model has high loss is just very small corpus..?

If my fixed code is unright, how to revise original code? Please reply to me. I am learning much from your book. Thanks!

iamcodingcat avatar Dec 05 '21 05:12 iamcodingcat

vocab_size = len(word_to_id)
contexts, target = create_contexts_target(corpus, window_size)
target = convert_one_hot(target, vocab_size)
contexts = convert_one_hot(contexts, vocab_size)

model = SimpleSkipGram(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

trainer.fit(contexts, target, max_epoch, batch_size)
trainer.plot()

This code will work.

haithink avatar Jan 09 '24 02:01 haithink