guidance
guidance copied to clipboard
Raw text streaming with hugging face transformers?
What's the best way to take advantage of the streaming capabilities of hugging face transformers in this library? I see that streaming is all done internally but it's unclear how its exposed to the library user (me)
I figured out a few hacky methods. The only promising one so far is using list_append with a List-like object. Code needs a lot of work. append is called once at the beginning, and then __setitem__
is repeatedly called with -1, so the whole thing works more like a callback itself (output down below):
class dynlist:
def __init__(self, callback):
self.data = list()
self.callback = callback
def append(self, item):
self.data.append(item)
self.callback(self.data)
def __setitem__(self, key, val):
self.data.__setitem__(key, val)
self.callback(self.data)
def update_x(session_id, data: List[str]):
....
prompt = guidance("...{{~gen "response" list_append=True temperature=0.4 top_p=0.9}}")
my_session_id = ...
response = dynlist(functools.partial(update_x, my_session_id))
await prompt("...", llm=llm, stream=True, async_mode=True, response=response)
list-append:
set-item: -1 I
set-item: -1 I like
set-item: -1 I like hanging
set-item: -1 I like hanging out
set-item: -1 I like hanging out with
set-item: -1 I like hanging out with you
set-item: -1 I like hanging out with you.
set-item: -1 I like hanging out with you.
set-item: -1 I like hanging out with you.
This is a great question! I will try and get back to you it tomorrow after considering what would be the best thing to expose for this.
+1
Below is the general design I had but I'll be dropping guidance for now (I noticed in my testing that guidance is merely sending the same prompt I was generating manually... my use case is almost certainly too simple for the added complexity right now).
One issue is that prompt(...)
doesn't return anything useful to await on so I have no idea when generation is actually complete (except the callback gets endoftext).
https://gist.github.com/sheenobu/69e70f4ef65778d8ad57cb18db2b5071
I would consider this a duplicate of #25
Using internal classes, this is my workaround for now that seems to be working. Sharing in case its useful for anyone else.
import guidance
import nest_asyncio
import asyncio
def iter_over_async(ait, loop):
ait = ait.__aiter__()
async def get_next():
try:
obj = await ait.__anext__()
return False, obj
except StopAsyncIteration:
return True, None
while True:
done, obj = loop.run_until_complete(get_next())
if done:
break
yield obj
async def generator_for_new_tokens(program, *args, **kwargs):
future = program(*args, **kwargs, silent=True, async_mode=True)
starting_text = future.text
while not future._execute_complete.is_set():
await asyncio.sleep(0.2)
snapshot = future.text
yield snapshot[len(starting_text):]
starting_text = snapshot
yield future.text[len(starting_text):]
def run_and_stream(program, *args, **kwargs):
try:
other_loop = asyncio.get_event_loop()
nest_asyncio.apply(other_loop)
except RuntimeError:
pass
loop = asyncio.new_event_loop()
full_text = ""
for new_text in iter_over_async(generator_for_new_tokens(program, *args, **kwargs), loop):
if new_text:
full_text += new_text
yield new_text
Did you try this with the openai models as well?
I think https://github.com/microsoft/guidance/discussions/129 answers this now, feel free to reopen otherwise.