mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Allow subsets of sequences in prefix cacher

Open EricLBuehler opened this issue 1 year ago • 4 comments

Refs #347.

EricLBuehler avatar May 27 '24 00:05 EricLBuehler

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 24          864          731           25          108
 TOML                   15          402          364            1           37
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1056            0          782          274
 |- BASH                 6          203          190            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1565          472          791          302
-------------------------------------------------------------------------------
 Rust                   91        29563        26945          424         2194
 |- Markdown            46          467            0          454           13
 (Total)                          30030        26945          878         2207
===============================================================================
 Total                 159        32382        28455         1232         2695
===============================================================================
  

github-actions[bot] avatar May 27 '24 00:05 github-actions[bot]

nice, this is cool! I'll work on deleting from the prefix-cache if some byte-threshold was exceeded. Long term we could bring the trie structure back and do some vLLM style stuff with pointers to caches?

gregszumel avatar May 28 '24 14:05 gregszumel

nice, this is cool! I'll work on deleting from the prefix-cache if some byte-threshold was exceeded. Long term we could bring the trie structure back and do some vLLM style stuff with pointers to caches?

Yeah, a byte threshold metric would be a big improvement!

Regarding the trie structure, it should be doable for non-quantized models where I can modify the matmul implementation. I think that would improve the scaling for non-quantized models, too.

EricLBuehler avatar May 29 '24 18:05 EricLBuehler

So close:

master (correct):

> hi
Hello! How can I assist you today?
> what is graphene
Graphene is a two-dimensional material made of carbon atoms arranged in a hexagonal lattice. It is a very strong and flexible material, with a high electrical conductivity and thermal conductivity. Graphene has many potential applications in fields such as electronics, energy storage, and materials science. It is considered to be one of the most promising materials for the future, due to its unique properties and potential for widespread use.

prefix_cacher_subseq:

> hi
Hello! How can I assist you today?
> what is graphene # < ---- Use prefix cacher here
Graphene


Graphene is a two-dimensional material made of carbon atoms arranged in a hexagonal lattice. It is a highly conductive and flexible material with a high surface area and unique electronic properties. Graphene has been shown to have potential applications in various fields, including electronics, energy storage, and biomedicine.

Looks like a RoPE seqlen offset problem.

EricLBuehler avatar Jun 03 '24 01:06 EricLBuehler