How to use GPU to retrieve?
Thank you for sharing the codes. COIL achieves very impressive retrieval performance. I wonder how to use GPU for retrieval.
The current public retriever implementation uses pytorch API calls, so technically it will take as little as adding a few .cuda() calls to make it run on GPU. Optimizing it may take some efforts. I can make a patch but that could take some time as I am currently having quite a few things on my plate..
Thanks. I can implement it myself by just adding a few .cuda() calls. But can I achieve the GPU latency reported in the paper in this way?
As I said, optimizing it could take some effort. Some considerations include keeping memory aligned and contiguous. GPU topk efficiency is also tricky. It is also likely to be hardware dependent.
I see. The original experimental implementation includes many optimization tricks.
I will try simply adding the .cuda() calls and look forward to the your optimized GPU retrieval codes.
Thank you!