character-bert icon indicating copy to clipboard operation
character-bert copied to clipboard

Printing character level vectors

Open ozturkoktay opened this issue 3 years ago • 2 comments

Hi,

You're printing words and their embeddings using:

for token, embedding in zip(x, embeddings_for_x):
    print(token, embedding)

How can I see each letter's vector?

ozturkoktay avatar Sep 08 '21 09:09 ozturkoktay

Hi @ozturkoktay, CharacterBERT is actually a word-level model. So, although it looks at each word's characters, it generates word-level vectors. If you really like to look at character vectors the only way is to extract the character embedding layer. But note that the elements of this matrix are not really characters but utf-8 bytes. 😊

helboukkouri avatar Sep 09 '21 09:09 helboukkouri

Hi @helboukkouri, How can I extract the character embedding layer? Can you please share a code example?

ozturkoktay avatar Sep 12 '21 14:09 ozturkoktay