Lang Xu
Lang Xu
@lindstro Thank you for the instructions on running ZFP with AMD support! We have collected some throughput numbers on mi100s and a100s with different thread block size. - We chose...
Will take a look at this!
- [x] Flash-Attention-2 detection code - [x] All fused-kernels (except `fused_rotary_positional_embedding`) build passed with MI250X+ROCm5.6.0 without HIP guards - [x] `fused_rotary_positional_embedding` build on AMD GPUs - [x] Adding HIP guards...
Status Update: Been busy with life things recently, will get a new clean branch pushed out by next week.
The issue persists with torch version >=2.2.0, switching to 2.1.2 solved the error. This can be reproducible by issuing `PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py`