tsuki

Results 6 comments of tsuki

Thank you @tfeher for approving the PR. I have fixed the PR description.

low recall in DataT=I8/U8 tests due to https://github.com/rapidsai/raft/pull/2287. All additional vectors tend to be connected to large L2 norm dataset vector nodes if we don't normalize the dataset vectors.

Here is the performance of the top-1000 search using the multi-cta and multi-kernel modes. ![raft-top1000](https://github.com/rapidsai/raft/assets/12711693/1b1ee09f-d5f7-4d1d-b1d2-59254008c43a) For a fair comparison, I evaluated the performance with itopk=1024, 2048, and 4096, meaning that...

![raft-top1000](https://github.com/rapidsai/raft/assets/12711693/6a4867c1-5356-4540-b2a0-d2f985389a8b) This figure shows the performance of the multi-CTA kernel with different internal top-k sizes. In the current multi-CTA implementation, the internal top-k size is fixed at 32, but I...

Hi @elvircrn, thank you for using this library. As you mentioned, `mtk::wmma::mma_simt` does not invoke the MMA instruction. Instead, there are two ways to use the `m16n8k16` instruction to compute...