guidance Raw text streaming with hugging face transformers?

What's the best way to take advantage of the streaming capabilities of hugging face transformers in this library? I see that streaming is all done internally but it's unclear how its exposed to the library user (me)

I figured out a few hacky methods. The only promising one so far is using list_append with a List-like object. Code needs a lot of work. append is called once at the beginning, and then __setitem__ is repeatedly called with -1, so the whole thing works more like a callback itself (output down below):

class dynlist:
    def __init__(self, callback):
        self.data = list()
        self.callback = callback
        
    def append(self, item):
        self.data.append(item)
        self.callback(self.data)
        
    def __setitem__(self, key, val):
        self.data.__setitem__(key, val)
        self.callback(self.data)
        
def update_x(session_id, data: List[str]):
    ....

prompt = guidance("...{{~gen "response" list_append=True temperature=0.4 top_p=0.9}}")
my_session_id = ...
response = dynlist(functools.partial(update_x, my_session_id))
await prompt("...", llm=llm, stream=True, async_mode=True, response=response)

list-append: 
set-item: -1  I
set-item: -1  I like
set-item: -1  I like hanging
set-item: -1  I like hanging out
set-item: -1  I like hanging out with
set-item: -1  I like hanging out with you
set-item: -1  I like hanging out with you.
set-item: -1  I like hanging out with you.
set-item: -1  I like hanging out with you.

May 17 '23 20:05 sheenobu

This is a great question! I will try and get back to you it tomorrow after considering what would be the best thing to expose for this.

May 18 '23 00:05 slundberg

+1

May 19 '23 21:05 jloganolson

Below is the general design I had but I'll be dropping guidance for now (I noticed in my testing that guidance is merely sending the same prompt I was generating manually... my use case is almost certainly too simple for the added complexity right now).

One issue is that prompt(...) doesn't return anything useful to await on so I have no idea when generation is actually complete (except the callback gets endoftext).

https://gist.github.com/sheenobu/69e70f4ef65778d8ad57cb18db2b5071

May 20 '23 00:05 sheenobu

I would consider this a duplicate of #25

May 22 '23 06:05 andaag

Using internal classes, this is my workaround for now that seems to be working. Sharing in case its useful for anyone else.

import guidance
import nest_asyncio
import asyncio

def iter_over_async(ait, loop):
    ait = ait.__aiter__()
    async def get_next():
        try:
            obj = await ait.__anext__()
            return False, obj
        except StopAsyncIteration:
            return True, None
    while True:
        done, obj = loop.run_until_complete(get_next())
        if done:
            break
        yield obj

async def generator_for_new_tokens(program, *args, **kwargs):
    future = program(*args, **kwargs, silent=True, async_mode=True)
    starting_text = future.text
    while not future._execute_complete.is_set():
        await asyncio.sleep(0.2)
        snapshot = future.text
        yield snapshot[len(starting_text):]
        starting_text = snapshot
    yield future.text[len(starting_text):]


def run_and_stream(program, *args, **kwargs):
    try:
        other_loop = asyncio.get_event_loop()
        nest_asyncio.apply(other_loop)
    except RuntimeError:
        pass
    loop = asyncio.new_event_loop()

    full_text = ""
    for new_text in iter_over_async(generator_for_new_tokens(program, *args, **kwargs), loop):
        if new_text:
            full_text += new_text
            yield new_text

May 22 '23 21:05 bluecoconut

Did you try this with the openai models as well?

May 25 '23 08:05 FergusFettes

I think https://github.com/microsoft/guidance/discussions/129 answers this now, feel free to reopen otherwise.

Jun 06 '23 16:06 marcotcr

guidance guidance copied to clipboard

Raw text streaming with hugging face transformers?

guidance
guidance copied to clipboard