Zijun Zhou

Results 7 issues of Zijun Zhou

- Update gprc proto to support - Request: token id or text (one of). - Response: token id, text or both of them. - Currently, request with either token id...

- Customer request: We use multiple languages for clients and cannot implement detokenization in each one. Need to have server-side detokenization support.

Can we refactor the imports to make MaxText as Python Modules? It's pretty hard for developers to use or develop on top of it. - Blocking inference development with JetStream....

feature request

- Optimized TPU duty cycle (largest gap < 4ms) - Optimized TTFT: dispatch prefill tasks ASAP w/o unnecessary blocking in CPU, keep backpressure to enforce insert ASAP, return first token...