tokenizers
tokenizers copied to clipboard
Tokenizer VIsualizer
I tried using the tokenizer visualizer but it doesn't seem to work when I load the tokenizer using AutoTokenizer.from_pretrained().
Here's the error I'm getting below:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-34-0ae996782cca> in <module>()
----> 1 viz(text="I am a boy")
2 frames
/usr/local/lib/python3.7/dist-packages/tokenizers/tools/visualizer.py in __make_char_states(text, encoding, annotations)
370 # Todo make this a dataclass or named tuple
371 char_states: List[CharState] = [CharState(char_ix) for char_ix in range(len(text))]
--> 372 for token_ix, token in enumerate(encoding):
373 offsets = encoding.token_to_chars(token_ix)
374 if offsets is not None:
AttributeError: 'list' object has no attribute 'tokens'
see code below::
from tokenizers.tools import EncodingVisualizer
from transformers import XLMTokenizer, AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
viz = EncodingVisualizer(tokenizer)
viz(text="I am a boy")
@talolard @n1t0 any ideas?
from tokenizers.tools import EncodingVisualizer
from transformers import XLMTokenizer, AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
viz = EncodingVisualizer(tokenizer._tokenizer) # Change here
viz(text="I am a boy")
This is because you shouldn't use the transformers object but the tokenizer object directly.
Cheers !
@Narsil , you can close this.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.