JetStream icon indicating copy to clipboard operation
JetStream copied to clipboard

Performance optimized interleaved mode JetStream server

Open JoeZijunZhou opened this issue 7 months ago • 2 comments

  • Optimized TPU duty cycle (largest gap < 4ms)
  • Optimized TTFT: dispatch prefill tasks ASAP w/o unnecessary blocking in CPU, keep backpressure to enforce insert ASAP, return first token ASAP.
  • Optimized TPOT: properly enforce generate and detokenize task in sequential w/o unnecessary blocking in CPU.
  • Optimized output token throughput: properly prioritize prefill and balancing TTFT and decode in high throughput situation.
  • Tested with llama2-70b JetStream MaxText server on v5e-8 VM

JoeZijunZhou avatar Jul 26 '24 10:07 JoeZijunZhou