blog
blog copied to clipboard
Add an article on neural "tokenization"
Hey there :space_invader:
Here's an article on why and how to replace current tokenizers.
The model behind it is called tokun: it specializes in text embeddings.
It produces much denser and more meaningful vectors than traditional tokenizers.
The link to Hugging Face (end of article) is not yet valid: I have to export my tf model before :)
BTW I've written notebooks too (training and demo)