blog icon indicating copy to clipboard operation
blog copied to clipboard

Add an article on neural "tokenization"

Open apehex opened this issue 1 year ago • 1 comments

Hey there :space_invader:

Here's an article on why and how to replace current tokenizers.

The model behind it is called tokun: it specializes in text embeddings. It produces much denser and more meaningful vectors than traditional tokenizers.

The link to Hugging Face (end of article) is not yet valid: I have to export my tf model before :)

apehex avatar Jun 20 '24 14:06 apehex

BTW I've written notebooks too (training and demo)

apehex avatar Jun 20 '24 14:06 apehex