flashinfer icon indicating copy to clipboard operation
flashinfer copied to clipboard

initial support blackwell

Open johnnynunez opened this issue 11 months ago • 10 comments

10.0 blackwell b100/b200 12.0 blackwell rtx50

johnnynunez avatar Jan 21 '25 22:01 johnnynunez

Hi @johnnynunez , thanks for bringing this up! Could we hold this PR and wait for the official release of torch 2.6 and blackwell software stack?

yzh119 avatar Jan 23 '25 15:01 yzh119

Hi @johnnynunez , thanks for bringing this up! Could be hold this PR and wait for the official release of torch 2.6 and blackwell software stack?

Yeah for sure! I put all codegen blackwell family on pytorch. Also you have references here: https://github.com/NVIDIA/cccl/issues/3493

johnnynunez avatar Jan 23 '25 16:01 johnnynunez

Hi @johnnynunez , thanks for bringing this up! Could we hold this PR and wait for the official release of torch 2.6 and blackwell software stack?

FYI: https://github.com/pytorch/pytorch/pull/145436

johnnynunez avatar Jan 23 '25 16:01 johnnynunez

FYI: https://docs.nvidia.com/cuda/pdf/ptx_isa_8.7.pdf image

johnnynunez avatar Jan 23 '25 21:01 johnnynunez

FYI: https://docs.nvidia.com/cuda/pdf/ptx_isa_8.7.pdf image

This is huge!

yzh119 avatar Jan 23 '25 21:01 yzh119

@yzh119 can you merge?

johnnynunez avatar Jan 25 '25 10:01 johnnynunez

@yzh119 can you merge?

@johnnynunez remind https://github.com/flashinfer-ai/flashinfer/pull/747#issuecomment-2610198665

zhyncs avatar Jan 25 '25 10:01 zhyncs

well, sure... pytorch is coming this week : M6: Release Day (1/29/25)

johnnynunez avatar Jan 25 '25 10:01 johnnynunez

Is there a prebuilt that can work for B200?

ghostplant avatar Feb 16 '25 15:02 ghostplant

What performance improvement should we expect out of the box on B200 compared to H100 SXM5 for different size models ? 8B, 70B, 400B. I expected to get some benefit even for 8B (e.g. 30% for low batch sizes), but I am getting no benefit using Llama 8B.

Also is there any planned on in-progress work on flashinfer utilizing B200 specific capabilities (e.g. Tensor Memory Accelerator) ?

YavorGIvanov avatar Apr 16 '25 15:04 YavorGIvanov