elmo-bilstm-cnn-crf icon indicating copy to clipboard operation
elmo-bilstm-cnn-crf copied to clipboard

Performance on CoNLL-2003

Open allanj opened this issue 5 years ago • 5 comments

Can I know what's the performance you obtain with your new implementation

allanj avatar Jan 07 '19 01:01 allanj

Hi @allanj out-of-the-box, without any further tuning and with a rather simple model:

ELMo 5.5B embeddings: 91.81 +/- 0.19 ELMo 5.5B embeddings + GloVe word embeddings: 92.07 +/- 0.24 ELMo 5.5B embeddings + Komninos word embeddings: 92.13 +/- 0.17

nreimers avatar Jan 07 '19 09:01 nreimers

Sorry for the late reply, but which layer of the hidden state you use? average or the final layer

allanj avatar Mar 22 '19 15:03 allanj

I recommend average

nreimers avatar Mar 22 '19 17:03 nreimers

Thanks, I also found weighted average in neuralnets/ELMoWordEmbeddings.py, can I ask why it is just simply swaping the axes? If I'm not wrong, the 0 dimension is the layer and the first dimension is the position.

def applyElmoMode(self, elmo_vectors):
        if self.elmo_mode == 'average':
            return np.average(elmo_vectors, axis=0).astype(np.float32)
        elif self.elmo_mode == 'weighted_average':
            return np.swapaxes(elmo_vectors,0,1)
        elif self.elmo_mode == 'last':
            return elmo_vectors[-1, :, :]
        elif isinstance(self.elmo_mode, int):
            return elmo_vectors[int(self.elmo_mode), :, :]
        else:
            print("Unknown ELMo mode")
            assert (False)

allanj avatar Mar 24 '19 08:03 allanj

The weights are added and trained as part of the neural network. The ElmoEmbeddings class hence only returns the 3 layers. To be compatible for the input of the neural network, the axes must be swapped.

nreimers avatar Mar 24 '19 10:03 nreimers