Keshav Santhanam
Keshav Santhanam
The lease variables `duration` and `max_duration` are too ambiguous - rename these
The `Scheduler` and `Profiler` classes currently share a lot of code - we can factor this out into a common superclass (e.g. `SchedulerMechanism`)
Variable length generation requires `seq_idx` and `cu_seqlens` to span the full input sequence length but in some cases I would want to include padding tokens (e.g. for maintaining static shapes...
The outputs produced by variable-length generation (i.e., passing `seq_idx` and `cu_seqlens`) do not match the outputs produced by sequentially generating a single request at a time. I have included a...
Updates the text generation server to use the `DynamicInferenceCoordinator`.