text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Guidance acceleration

Open Atry opened this issue 1 year ago • 8 comments

Feature request

Guidance can control the generated format, which could be a nice feature if it is built-in

  • Add extra parameter to /generate and /generate_stream protocol to specify a template
  • Use guidance-like mask to generate tokens and pre-fill known tokens, alternately

Motivation

See https://github.com/microsoft/guidance#guidance-acceleration-notebook

Your contribution

Just an idea for now

Atry avatar Jun 28 '23 22:06 Atry

For now, won't it be better to add text-generation-inference as a supported LLM into Guidance, in the same way as it did for OpenAI's models?

Vinno97 avatar Jun 29 '23 13:06 Vinno97

Right, that's possible, but built-in support would "significantly improve inference performance".

When multiple generation or LLM-directed control flow statements are used in a single Guidance program then we can significantly improve inference performance by optimally reusing the Key/Value caches as we progress through the prompt. This means Guidance only asks the LLM to generate the green text below, not the entire program. This cuts this prompt's runtime in half vs. a standard generation approach.

Atry avatar Jun 29 '23 15:06 Atry

Similar feature in another text generation project: https://github.com/ggerganov/llama.cpp/pull/1773

Atry avatar Jul 25 '23 01:07 Atry

See also https://github.com/go-skynet/LocalAI/issues/354

Atry avatar Jul 26 '23 22:07 Atry

Similar feature in another text generation project: ggerganov/llama.cpp#1773

I wonder konw if it can or not get a stream with batch process

yubuyuabc avatar Aug 25 '23 10:08 yubuyuabc

This is one of our priority for the next release.

OlivierDehaene avatar Sep 06 '23 13:09 OlivierDehaene

Do you happen to have any updates on this matter? Coming from LMQL

KreshLaDoge avatar Jan 31 '24 13:01 KreshLaDoge

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 13 '24 01:05 github-actions[bot]