cuda-kat
cuda-kat copied to clipboard
Add wrappers (and builtins?) for more PTX instructions
The following PTX instructions don't have wrapper functions (nor builtins::
templated functions where relevant). Add them!
- [ ]
lop3
- Logical operation on 3 operands using an immediate 3-parameter lookup table. - [ ] prefetching instructions?
- [ ]
cvt.pack
- [ ]
fns
- find n'th bit set - [ ] Sub-32-bit dot product with accumulation:
dp4a
,dp2a
for bytes and halfword, respectively.