Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

[router] Use string instead of token ids

Introducing tokenization brings some complexity on tokenizer managed (if we want every model uses their own tokenizer) as well. We need to consider the benefits and at least make this...

[router] Use string instead of token ids

@gaocegege Originally, I think the token based solution could be more aligned with the "page" tokens in vLLM and chunk by chunk alignment would be tidy comparing to two different...

[router] LSH based prefix cache aware router

@varungup90 @DwyaneShi can you spend some time on this issue?

[router] LSH based prefix cache aware router

I will spend some time in implementing this as alternative to decision tree or composite metrics based algorithms

[router] LSH based prefix cache aware router

@kerthcet A little it different. Currently, the primary work is still on vLLM's automatic prefix cache. without additional kv cache compressor or reuse capabilities. More on the routing side, I...

[router] LSH based prefix cache aware router

v0.3.0 has enough routing strategy invented and improved. - Preble (Radix Tree + Prediction Based Load aware) - Fairness - Prefix Cache (Hashing Block) + heuristic Load aware Due to...

WIP: Blog post for new static policy

@a-mccarthy I will work on the blog once the enhancement PR is merged.

WIP: Blog post for new static policy

@rashansmith thanks for the feedback. The code PR has been merged and I am starting to draft the blog post today.

WIP: Blog post for new static policy

I made some draft and update the PR here. It still needs some diagrams, contents and I will try to make it soon

WIP: Blog post for new static policy

@rashansmith sorry for the delay. I add the diagrams and polish some paragraphs and this should be in a good shape for reviewing now. Please take a look