Thomas Parnell

Results 24 comments of Thomas Parnell

@comaniac Sure, here are the bf16 results, as well as some other datapoints we have collected: The column `FORCE_TENSOR_CORES` relates to enabling the changes from this PR: https://github.com/vllm-project/vllm/pull/9497 It looks...

This issue seems relevant: https://github.com/flashinfer-ai/flashinfer/issues/520 It sounds like setting `use_tensor_cores=True` actually invokes the prefill kernel, so the issue that @jeejeelee linked above may indeed be very relevant.

@tlrmchlsmth yeah it's exactly that code (which I guess is somehow related to TP>1).

I think we should do this at least: https://github.com/vllm-project/vllm/pull/6327