[FEA] Add cuTensorMapEncodeTiled to CudaHostAdapter

Open drisspg opened this issue 8 months ago • 4 comments

Summary

We are currently working on integrating a fp8 scaled matmul kernel written using Cutlass into PyTorch. PyTorch has the constraint that it can be linked against the cuda driver api. There is one symbol/direct call to a cuda driver api cuTensorMapEncodeTiled that is causing issues.

We have a temporary workaround here: https://github.com/pytorch/pytorch/pull/125204#discussion_r1618787335

There was a suggestion to add this symbol to CudaHostAdapter so as to add one more layer of indirection. This would greatly aid in PyTorch in its utilization of Cutlass.

May 31 '24 18:05 drisspg

cutlass cutlass copied to clipboard

[FEA] Add cuTensorMapEncodeTiled to CudaHostAdapter

Summary

cutlass
cutlass copied to clipboard