llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Export tokenizers to huggingface (eg: Tinystories260K)

Open nickypro opened this issue 2 years ago • 0 comments

In https://github.com/karpathy/llama2.c/pull/395 support was added to export the model to work with HuggingFace trasformers. However, this only works on the model, and not the tokenizer, so only works with the default tokenizer.

In order to use the converted tinystories 260K model, one would need to use the smaller vocabulary/tokenizer, so it would be good to be able to export the tokenizer into the HuggingFace format as well.

nickypro avatar Sep 28 '23 13:09 nickypro