keras-applications icon indicating copy to clipboard operation
keras-applications copied to clipboard

glove embeddings with keras

Open csharma opened this issue 4 years ago • 0 comments

Hi,

I'm setting up my keras layer for glove embeddings as follows:

    X = X[:set_size]
    y = y[:set_size] 
    yy = []
     
    for i in range(0,len(X)):
        yy = np.concatenate((yy,X[i,0:set_size]), axis=0)

    X = "".join([" "+i if not i.startswith("'") and i not in string.punctuation else i for i in yy]).strip() 

    t = Tokenizer()
    t.fit_on_texts(X)
    encoded_docs = t.texts_to_sequences(X)
    # pad documents to a max length of 4 words
    max_length = 4
    
    vocab_size = len(t.word_index) + 1
    embeddings_index = dict()
    f = open('glove.6B/glove.6B.50d.txt', encoding='utf8')
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs
    f.close()
    print('Loaded %s word vectors.' % len(embeddings_index))
    # create a weight matrix for words in training docs
    embedding_matrix = zeros((vocab_size, 50))
    for word, i in t.word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
    
    vocab_size = len(t.word_index) + 1
    max_len = 200
    sequence_input = Input(shape=(max_len,), dtype='int32', name="seq_input")
     
    embedding_layer = Embedding(vocab_size, 50, weights=[embedding_matrix], input_length=200, trainable=False)
     
    e = embedding_layer(sequence_input)

I then pass this layer to a GRU neural network.

    testGru = GRU(units=16,  return_sequences=False )(e)
    classification = Dense(1, activation='sigmoid')(testGru)
    model = Model(sequence_input, outputs=classification)

I get the following error when I try to do a fit. validation_split = int((0.05 * set_size / batch_size)) * batch_size / set_size model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])

try: with redirect_stdout(sys.stderr): with graph.as_default(): history = model.fit(np.array(X), y, validation_split=validation_split, shuffle=True, batch_size=batch_size, epochs=epochs, verbose=1, callbacks=fit_callbacks) except Exception as e: raise e

I get the following error:

ValueError: Error when checking input: expected seq_input to have 2 dimensions, but got array with shape ()

Please help

Best regards, Cartik

csharma avatar Oct 01 '19 19:10 csharma