lmdeploy 【Design Questinon】any plan to decouple batching and cache from llama?

【Design Questinon】any plan to decouple batching and cache from llama?

Open jinuxstyle opened this issue 1 year ago • 1 comments

Is there any reason why the batching and cache manager are implemented inside llama? Those looks generic functionalities, and better not mixed with the llama code. It looks misleading that turbomind only supports llama.

And is there any plan the abstract and decouple those functionalities from llama?

Sep 26 '23 03:09 jinuxstyle

In fact, currently turbomind only supports llama family models.😂

It is ongoing work to decouple the engine and model implementation (likely to finish in october).

Sep 26 '23 03:09 lzhangzz

lmdeploy lmdeploy copied to clipboard

【Design Questinon】any plan to decouple batching and cache from llama?

lmdeploy
lmdeploy copied to clipboard