mahout
mahout copied to clipboard
[QDP] gpu tensor zero-copy via dlpack protocol
Summary
This issue asks gpu tensor support using dlpack protocol for zero-copy data transfer. This is critical for training loop performance where tensors are already on gpu.
- extend encode_tensor() to handle gpu tensors
- add dlpack extraction
- add gpu pointer validation
- add stream synchronization
- add direct gpu processing
If anything important is missing, please leave a comment.