Woosuk Kwon
Woosuk Kwon
@yzh119 Oh yes, we don't need a new kernel for decode. However, if I understand correctly, we need a new kernel for prefills?
Hi @njhill, do you mind if we merge #12193 first and review this PR? I'd like to prioritize the spec decode PR as it already got rebased many many times.
@njhill Sorry for the delay. I will review this PR once it's rebased.
@njhill I'm not sure it's worthwhile to change from `[]` to `()`. I did a microbenchmark: ```python N = 1024 x = [] # List start = time.perf_counter() for i...
@njhill I think changing `List` to `Sequence` itself is increasing complexity? After that, we need to consider whether it's a tuple or list. I'd prefer to keep using `List` and...
I'm good with this doc change, but a little bit worried about the potential confusion and complexity as the Intel team will be adding IPEX or other intel-cpu-only optimizations to...
@DamonFool Sorry for misleading you. Yes this PR doesn't have a problem. I just wanted to say we'll need to figure out how to efficiently maintain the Intel and non-Intel...
Any updates?
@abmfy Oh actually, before we merge this PR, can we have a (unit) test?
Also, maybe not in this PR, but it'd be nice if we can group the eplb-related configs (or ep-related ones) into a separate config. We did it for `compilation_config` and...