Arihan Varanasi

Results 5 comments of Arihan Varanasi

Sounds like an interesting use case. Makes it easier to help with specific bugs/errors as well as repeating questions.

I was attempting to create a function to enable streaming myself and wrote very similar code to @gitkaz. I've tested this with microsoft/phi-2 on English language output and it has...

In that case would it make sense to stream batches of tokens for llama (store and decode multiple at a time)? What would be a generally optimal way?

For llama, this is suboptimal as you mentioned but works for now: ``` def get_stream(): res = "" tokens = [] for token, prob in generate_step(encoded, model): if token ==...

Hey there! Did you deploy `modal_app.py` or `shush.py`?