tiktoken Cache for Encoding - Runtime Boosted by 12%

Cache for Encoding - Runtime Boosted by 12%

Open Majdoddin opened this issue 1 year ago • 0 comments

This PR introduces a caching mechanism in _encode_ordinary_native(), which stores the tokens for each "piece" of text. When a piece of text is repeated, its tokens are retrieved from the cache instead of being tokenized again.

This results in a runtime improvement of over 12% (from 20.21s to 17.96s on a single CPU core) when encoding 100MB of Linux source code as a single text.

The cache hit ratio is very high, approximately 95%. The final cache size is only 0.5% of the total number of pieces (218,450 vs. 39,769,721).

TODO:

Despite the 95% cache hit ratio, the expected runtime boost was not fully realized. This is because 80% of the loop runtime in the current code is spent splitting the text using regex. While this PR makes the tokenization logic 65% faster, the BIG gain can be achieved by optimizing the text splitting, possibly through multithreading.
Investigate declaring the cache in the struct CoreBPE so that it can be utilized across subsequent calls.

Jul 10 '24 10:07 Majdoddin

tiktoken tiktoken copied to clipboard

Cache for Encoding - Runtime Boosted by 12%

tiktoken
tiktoken copied to clipboard