lmdeploy
lmdeploy copied to clipboard
【Design Questinon】any plan to decouple batching and cache from llama?
Is there any reason why the batching and cache manager are implemented inside llama? Those looks generic functionalities, and better not mixed with the llama code. It looks misleading that turbomind only supports llama.
And is there any plan the abstract and decouple those functionalities from llama?
In fact, currently turbomind only supports llama family models.😂
It is ongoing work to decouple the engine and model implementation (likely to finish in october).