chore: remove usage of load_tiktoken_bpe

Open leseb opened this issue 5 months ago • 2 comments

What does this PR do?

The load_tiktoken_bpe() function depends on blobfile to load tokenizer.model files. However, blobfile brings in pycryptodomex, which is primarily used for JWT signing in GCP - functionality we don’t require, as we always load tokenizers from local files. pycryptodomex implements its own cryptographic primitives, which are known to be problematic and insecure. While blobfile could potentially switch to the more secure PyCA cryptography library, the project appears inactive, so this transition may not happen soon. Fortunately, load_tiktoken_bpe() is a simple function that just reads a BPE file and returns a dictionary mapping byte sequences to their mergeable ranks. It’s straightforward enough for us to implement ourselves.

Test Plan

Run unit tests

May 27 '25 09:05 leseb

llama-stack llama-stack copied to clipboard

chore: remove usage of load_tiktoken_bpe

What does this PR do?

Test Plan

llama-stack
llama-stack copied to clipboard