structshot
structshot copied to clipboard
Code associated with Figure 3 in the paper
From the original paper
We project token-level representations obtained from the BERT embedders onto a 2-dimentional space using t-SNE.
And the paper claims that Figure 3 shows the usefulness of pretraining on OntoNotes by showing more compact clusters. However, as the word embeddings returned by transformer model are contextualized, I am wondering how you get the embeddings of individual tokens in the test set and then apply the t-SNE technique. Do you obtain all of the embeddings and then do the average?
Additionally, I could not find the associated code for visualizing embeddings. Would it be possible the code to obtain Figure 3 provided?