mahout
mahout copied to clipboard
[Roadmap] [QDP] GPU/Cuda kernel implementation
This is the mega roadmap for GPU related progress, feel free to create an issue and link it to the roadmap:
- Implement kernel #677
- Optimizations: #706
- Parallel normalization kernel
- Coalesced memory access patterns
- Warp-level optimizations
- Stream support for async execution
- Implement cuda optional test @ryankert01
- If no cuda device, the test skips
- Implement suitable benchmark
- The Scaling Test (Latency vs. Qubits)
- The DataLoader Test (Batch Throughput) #687
- gracefully handles OOM #688
- Future encoding methods:
- launch_angle_encode (angle encoding)
- launch_basis_encode (basis encoding)
- launch_iqp_encode (IQP encoding)
- apache license pre-commit(cuda) #684
- (after poc) move pre-processing from cpu to gpu