tsuki
tsuki
Thank you @tfeher for approving the PR. I have fixed the PR description.
/ok to test
low recall in DataT=I8/U8 tests due to https://github.com/rapidsai/raft/pull/2287. All additional vectors tend to be connected to large L2 norm dataset vector nodes if we don't normalize the dataset vectors.
Here is the performance of the top-1000 search using the multi-cta and multi-kernel modes. data:image/s3,"s3://crabby-images/d256b/d256b8b6e730c41c3eec144558593fd86cbcf05b" alt="raft-top1000" For a fair comparison, I evaluated the performance with itopk=1024, 2048, and 4096, meaning that...
data:image/s3,"s3://crabby-images/acaba/acabacf359a5a67e18d4c999d2e68011bfa79761" alt="raft-top1000" This figure shows the performance of the multi-CTA kernel with different internal top-k sizes. In the current multi-CTA implementation, the internal top-k size is fixed at 32, but I...
Hi @elvircrn, thank you for using this library. As you mentioned, `mtk::wmma::mma_simt` does not invoke the MMA instruction. Instead, there are two ways to use the `m16n8k16` instruction to compute...