llama.cpp Feature Request: How to support model with dynamic inference graph

Feature Request: How to support model with dynamic inference graph

Open RunningLeon opened this issue 1 year ago • 1 comments

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

How to support model that has dynamic inference graph between prefiling and decoding stages.

Hi, thanks for your notice. I want to support model in llama.cpp that has a different compute graph between prefilling and decoding stages. I wonder if llama.cpp support dynamic inference graph(skip some layers in prefill stages). If so, how to do this ?