tiktoken
tiktoken copied to clipboard
Cache for Encoding - Runtime Boosted by 12%
This PR introduces a caching mechanism in _encode_ordinary_native(), which stores the tokens for each "piece" of text. When a piece of text is repeated, its tokens are retrieved from the cache instead of being tokenized again.
This results in a runtime improvement of over 12% (from 20.21s to 17.96s on a single CPU core) when encoding 100MB of Linux source code as a single text.
The cache hit ratio is very high, approximately 95%. The final cache size is only 0.5% of the total number of pieces (218,450 vs. 39,769,721).
TODO:
- Despite the 95% cache hit ratio, the expected runtime boost was not fully realized. This is because 80% of the loop runtime in the current code is spent splitting the text using
regex. While this PR makes the tokenization logic 65% faster, the BIG gain can be achieved by optimizing the text splitting, possibly through multithreading. - Investigate declaring the cache in the
struct CoreBPEso that it can be utilized across subsequent calls.