Thomas Parnell
Thomas Parnell
@comaniac Sure, here are the bf16 results, as well as some other datapoints we have collected: The column `FORCE_TENSOR_CORES` relates to enabling the changes from this PR: https://github.com/vllm-project/vllm/pull/9497 It looks...
This issue seems relevant: https://github.com/flashinfer-ai/flashinfer/issues/520 It sounds like setting `use_tensor_cores=True` actually invokes the prefill kernel, so the issue that @jeejeelee linked above may indeed be very relevant.
@tlrmchlsmth yeah it's exactly that code (which I guess is somehow related to TP>1).
I think we should do this at least: https://github.com/vllm-project/vllm/pull/6327