elmo-bilstm-cnn-crf
elmo-bilstm-cnn-crf copied to clipboard
Performance on CoNLL-2003
Can I know what's the performance you obtain with your new implementation
Hi @allanj out-of-the-box, without any further tuning and with a rather simple model:
ELMo 5.5B embeddings: 91.81 +/- 0.19 ELMo 5.5B embeddings + GloVe word embeddings: 92.07 +/- 0.24 ELMo 5.5B embeddings + Komninos word embeddings: 92.13 +/- 0.17
Sorry for the late reply, but which layer of the hidden state you use? average or the final layer
I recommend average
Thanks, I also found weighted average
in neuralnets/ELMoWordEmbeddings.py, can I ask why it is just simply swaping the axes? If I'm not wrong, the 0
dimension is the layer and the first dimension is the position.
def applyElmoMode(self, elmo_vectors):
if self.elmo_mode == 'average':
return np.average(elmo_vectors, axis=0).astype(np.float32)
elif self.elmo_mode == 'weighted_average':
return np.swapaxes(elmo_vectors,0,1)
elif self.elmo_mode == 'last':
return elmo_vectors[-1, :, :]
elif isinstance(self.elmo_mode, int):
return elmo_vectors[int(self.elmo_mode), :, :]
else:
print("Unknown ELMo mode")
assert (False)
The weights are added and trained as part of the neural network. The ElmoEmbeddings class hence only returns the 3 layers. To be compatible for the input of the neural network, the axes must be swapped.