hierarchical-attention-networks
hierarchical-attention-networks copied to clipboard
Is the embedding initialized with a pre-trained one?
From the code it seems the embedding is not initialized with a pre-trained embedding (i.e. word2vec), although in the paper it says so. Am I right or I missed something? Many thanks!
relevant code in _init_embedding
def _init_embedding(self, scope): #seems did not using word embedding with tf.variable_scope(scope): with tf.variable_scope("embedding") as scope: self.embedding_matrix = tf.get_variable( name="embedding_matrix", shape=[self.vocab_size, self.embedding_size], initializer=layers.xavier_initializer(), dtype=tf.float32) self.inputs_embedded = tf.nn.embedding_lookup( self.embedding_matrix, self.inputs)
The section about word embedding in the paper's paragraph 2.2:
Note that we directly use word embeddings. For a more complete model we could use a GRU to get word vectors directly from characters, similarly to (Ling et al., 2015). We omitted this for simplicity.
So it seem this implementation tries to train word embeddings that are specific to this task, except that the code trains on a one hot representation of the words, instead of the GRU character level representation mentioned here-above. Since the performances are lower than in the original paper, it seems that word2vec embeddings are better than the learned embeddings. I'm currently changing the code so that it supports word2vec plugged embeddings and I'll make a pull request soon.
In my experience with other language-related tasks, using pretrained embeddings doesn't make a lot of difference when dataset is sufficiently large, although I suspect it is very task and corpus-dependant.
@Sora77 would appreciate the PR!