tokenizers Tokenizer VIsualizer

I tried using the tokenizer visualizer but it doesn't seem to work when I load the tokenizer using AutoTokenizer.from_pretrained().

Here's the error I'm getting below:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-0ae996782cca> in <module>()
----> 1 viz(text="I am a boy")

2 frames
/usr/local/lib/python3.7/dist-packages/tokenizers/tools/visualizer.py in __make_char_states(text, encoding, annotations)
    370         # Todo make this a dataclass or named tuple
    371         char_states: List[CharState] = [CharState(char_ix) for char_ix in range(len(text))]
--> 372         for token_ix, token in enumerate(encoding):
    373             offsets = encoding.token_to_chars(token_ix)
    374             if offsets is not None:

AttributeError: 'list' object has no attribute 'tokens'

see code below::

from tokenizers.tools import EncodingVisualizer
from transformers import XLMTokenizer, AutoModel, AutoTokenizer

tokenizer  = AutoTokenizer.from_pretrained("bert-base-uncased")
viz = EncodingVisualizer(tokenizer)
viz(text="I am a boy")

@talolard @n1t0 any ideas?

May 05 '22 20:05 ToluClassics

from tokenizers.tools import EncodingVisualizer
from transformers import XLMTokenizer, AutoModel, AutoTokenizer

tokenizer  = AutoTokenizer.from_pretrained("bert-base-uncased")
viz = EncodingVisualizer(tokenizer._tokenizer) # Change here
viz(text="I am a boy")

This is because you shouldn't use the transformers object but the tokenizer object directly.

Cheers !

May 06 '22 07:05 Narsil

@Narsil , you can close this.

Oct 24 '23 21:10 Taytay

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Feb 17 '24 01:02 github-actions[bot]