Lang Xu comments

Results 5 comments of


                                            Lang Xu

Performance of Compreess side for AMD GPUs

@lindstro Thank you for the instructions on running ZFP with AMD support! We have collected some throughput numbers on mi100s and a100s with different thread block size. - We chose...

- [x] Flash-Attention-2 detection code - [x] All fused-kernels (except `fused_rotary_positional_embedding`) build passed with MI250X+ROCm5.6.0 without HIP guards - [x] `fused_rotary_positional_embedding` build on AMD GPUs - [x] Adding HIP guards...

Officially Support AMD GPUs

Status Update: Been busy with life things recently, will get a new clean branch pushed out by next week.

RuntimeError: memory format option is only supported by strided tensors

The issue persists with torch version >=2.2.0, switching to 2.1.2 solved the error. This can be reproducible by issuing `PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py`