Hosang
Hosang
Add additional custom paged attention kernels for AMD Navi 3x/4x GPU support based on PR: https://github.com/vllm-project/vllm/pull/12348 Due to the differences in architecture from MI, specific instructions and detailed logic have...
- resolved cache miss issue during triton flash attention calls by fixing `MAX_SEQLENS_Q/K` to `0` - `MAX_SEQLENS_Q/K` differs at each step, resulting in different key values and compilation for the...
## Essential Elements of an Effective PR Description Checklist - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x]...