Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

[Dynamic Batching] Concerns about whether features are not supported using Medusa

Open Ageliss opened this issue 1 year ago • 0 comments

I checked the TRT-LLM but found something confusing. There are some features not supported:

  1. inferece batch size == 1, (seemed solved recently)
  2. not surport in-flight batching, which will be a great concern since this feature greatly improve thorouput
  3. temperature == 0, how about temperature > 0?
  4. kv_cache, I guess kv_cache needed recompute because generate_tokens will > 1

My biggest concern is whether Medusa2 conflicts with in-flight batching?

image

Ageliss avatar Feb 21 '24 11:02 Ageliss