Tri Dao

https://tridao.me

Princeton, NJ Assistant Professor @ Princeton CS, machine learning & systems

Results 429 comments of


                                            Tri Dao

trafficstars

flash-attn3 supported L20?

Feel free to work on it if you need it.

flash-attn3 supported L20?

Right, Ada architecture doesn't have WGMMA and TMA. FA2 might already be close to optimal for Ada architecture.

flash-attn3 supported L20?

Ofc that's welcome. Depends on whether people want to contribute.

Compatibility of Flash Attention 3 FP8 Feature with L40 and A100 GPUs

It's not commonly done. FA2 is already close to optimal on A100 (70% max theoretical FLOPS).

Compatibility of Flash Attention 3 FP8 Feature with L40 and A100 GPUs

Warp-specialization will be difficult without the async features. Overlapping gemm and softmax would still be useful.

Correct method to load 2.7B?

Thanks for the bug report, we've just fixed this. There was a mistake in the mapping between old and new parameter names that we've now fixed.

AttributeError: module 'triton.language' has no attribute 'cumsum'

Please use triton >= 2.1.0

AttributeError: module 'triton.language' has no attribute 'cumsum'

triton 2.1.0 should have cumsum. If not you can try >= 2.2.0

AttributeError: module 'triton.language' has no attribute 'cumsum'

No we require tl.cumsum

‹
1
2
...
34
35
36
37
38
39
40
41
42
43