Arihan Varanasi
Arihan Varanasi
Sounds like an interesting use case. Makes it easier to help with specific bugs/errors as well as repeating questions.
I was attempting to create a function to enable streaming myself and wrote very similar code to @gitkaz. I've tested this with microsoft/phi-2 on English language output and it has...
In that case would it make sense to stream batches of tokens for llama (store and decode multiple at a time)? What would be a generally optimal way?
For llama, this is suboptimal as you mentioned but works for now: ``` def get_stream(): res = "" tokens = [] for token, prob in generate_step(encoded, model): if token ==...
Hey there! Did you deploy `modal_app.py` or `shush.py`?