Allow subsets of sequences in prefix cacher
Refs #347.
Code Metrics Report
=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 9 21 21 0 0 Python 24 864 731 25 108 TOML 15 402 364 1 37 ------------------------------------------------------------------------------- Jupyter Notebooks 1 0 0 0 0 |- Markdown 1 60 30 22 8 |- Python 1 96 87 1 8 (Total) 156 117 23 16 ------------------------------------------------------------------------------- Markdown 16 1056 0 782 274 |- BASH 6 203 190 0 13 |- Python 6 121 110 0 11 |- Rust 3 185 172 9 4 (Total) 1565 472 791 302 ------------------------------------------------------------------------------- Rust 91 29563 26945 424 2194 |- Markdown 46 467 0 454 13 (Total) 30030 26945 878 2207 =============================================================================== Total 159 32382 28455 1232 2695 ===============================================================================
nice, this is cool! I'll work on deleting from the prefix-cache if some byte-threshold was exceeded. Long term we could bring the trie structure back and do some vLLM style stuff with pointers to caches?
nice, this is cool! I'll work on deleting from the prefix-cache if some byte-threshold was exceeded. Long term we could bring the trie structure back and do some vLLM style stuff with pointers to caches?
Yeah, a byte threshold metric would be a big improvement!
Regarding the trie structure, it should be doable for non-quantized models where I can modify the matmul implementation. I think that would improve the scaling for non-quantized models, too.
So close:
master (correct):
> hi
Hello! How can I assist you today?
> what is graphene
Graphene is a two-dimensional material made of carbon atoms arranged in a hexagonal lattice. It is a very strong and flexible material, with a high electrical conductivity and thermal conductivity. Graphene has many potential applications in fields such as electronics, energy storage, and materials science. It is considered to be one of the most promising materials for the future, due to its unique properties and potential for widespread use.
prefix_cacher_subseq:
> hi
Hello! How can I assist you today?
> what is graphene # < ---- Use prefix cacher here
Graphene
Graphene is a two-dimensional material made of carbon atoms arranged in a hexagonal lattice. It is a highly conductive and flexible material with a high surface area and unique electronic properties. Graphene has been shown to have potential applications in various fields, including electronics, energy storage, and biomedicine.
Looks like a RoPE seqlen offset problem.