lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

【Design Questinon】any plan to decouple batching and cache from llama?

Open jinuxstyle opened this issue 1 year ago • 1 comments

Is there any reason why the batching and cache manager are implemented inside llama? Those looks generic functionalities, and better not mixed with the llama code. It looks misleading that turbomind only supports llama.

And is there any plan the abstract and decouple those functionalities from llama?

jinuxstyle avatar Sep 26 '23 03:09 jinuxstyle

In fact, currently turbomind only supports llama family models.😂

It is ongoing work to decouple the engine and model implementation (likely to finish in october).

lzhangzz avatar Sep 26 '23 03:09 lzhangzz