Tri Dao

Results 429 comments of Tri Dao
trafficstars

Can you check that this line is executed in setup.py? It sets the compiler flag to compile for 5090 etc https://github.com/Dao-AILab/flash-attention/blob/2f9ef0879a0935c3ca852f7a6a7b7a9c24f41e96/setup.py#L190

Right, you need nvcc version >= 12.8 to compile for 5090.

It's just a heuristic to determine num_splits. In this case it doesn't work super well. We can't use the info in cache_seqlens since that's on GPU and doing a sync...

Are you suggesting a different heuristic based on cache_max_seq_len? When would that be better / worse than the current heuristic?

There's a PR for that, will be merged soon.

I'm not familiar with XInference

Great, thank you so much! This bug is fixed in 4.1.

Thanks for the great suggestion. We've been pretty busy with a conference deadline but after this week we'd have more time.

On A100 or H100? If H100 then the Triton version uses new instructions on H100 but FA2 doesn't. You should try FA3 if you're on H100.