text-generation-inference Guidance acceleration

Feature request

Guidance can control the generated format, which could be a nice feature if it is built-in

Add extra parameter to /generate and /generate_stream protocol to specify a template
Use guidance-like mask to generate tokens and pre-fill known tokens, alternately

Motivation

See https://github.com/microsoft/guidance#guidance-acceleration-notebook

Your contribution

Just an idea for now

Jun 28 '23 22:06 Atry

For now, won't it be better to add text-generation-inference as a supported LLM into Guidance, in the same way as it did for OpenAI's models?

Jun 29 '23 13:06 Vinno97

Right, that's possible, but built-in support would "significantly improve inference performance".

When multiple generation or LLM-directed control flow statements are used in a single Guidance program then we can significantly improve inference performance by optimally reusing the Key/Value caches as we progress through the prompt. This means Guidance only asks the LLM to generate the green text below, not the entire program. This cuts this prompt's runtime in half vs. a standard generation approach.

Jun 29 '23 15:06 Atry

Similar feature in another text generation project: https://github.com/ggerganov/llama.cpp/pull/1773

Jul 25 '23 01:07 Atry

See also https://github.com/go-skynet/LocalAI/issues/354

Jul 26 '23 22:07 Atry

Similar feature in another text generation project: ggerganov/llama.cpp#1773

I wonder konw if it can or not get a stream with batch process

Aug 25 '23 10:08 yubuyuabc

This is one of our priority for the next release.

Sep 06 '23 13:09 OlivierDehaene

Do you happen to have any updates on this matter? Coming from LMQL

Jan 31 '24 13:01 KreshLaDoge

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 13 '24 01:05 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

Guidance acceleration

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard