ThunderKittens issues

Two questions

- What's the key improvement of TK compared to tensorrt? - Will TK provide some easy-to-use interfaces such as python wrapper?

dongrixinyu

[Feature Request] GEMM benchmarks and FP8 Support

7

I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If...

jwfromm

[Question] Supported compute capabilities?

3

I've been working on porting FlashAttention-2 to pre-SM80 architectures (Turing and Volta) and was wondering if TK supports SM70 and SM75 hardware. Writing 100 lines of TK primitives sounds a...

bayley

Hello, I'm curious if the implementation adopts the `ldmatrix` instruction for loading tiles from shared memory to registers. It seems the current version didn't implement `load()` with explicit `ldmatrix` per...

liyanc

Add support for head dimension 128

4

Most recent models use hdim=128, it would be great to see that ThunderKittens also support that. https://github.com/HazyResearch/ThunderKittens/blob/a562ed2569c45b0ffea844688594158cb7c6e858/examples/attn/h100/h100_train_atn.py#L25-L26

perkfly

[bug report] h100 attn_causal kernel

3

Using the same random seed, the result of tk h100 attn_causal kernel vary with each run. In some cases, the max diff between tk and pytorch result can be larger...

xiayuqing0622

ThunderKittens
ThunderKittens copied to clipboard

Metadata

Two questions

fix a small typo

[Feature Request] GEMM benchmarks and FP8 Support

[Question] Supported compute capabilities?

Load with ldmatrix

Add support for head dimension 128

[bug report] h100 attn_causal kernel

fix async bug

attn_bias rel-pos support to the FAv2 example

add suport for a100 atten

← Metadata

Owner

Metadata

ThunderKittens ThunderKittens copied to clipboard

Metadata

← Metadata

Owner

Metadata

ThunderKittens
ThunderKittens copied to clipboard