flashinfer
flashinfer copied to clipboard
ValueError: The dtype of q torch.bfloat16 does not match the q_data_type torch.float16 specified in plan function.
Hello,
I install flashinfer by AOT, where to modify q_data_type into torch.bfloat16 in plan function?
Thank you~
I think currently vllm uses the v0.1.5 style api and you can specify the q_data_type in the begin_forward function.
Closing, since it looks like @yzh119 has answered this, and it's an old question. Please re-open if it's still an issue.