[FEA] Add cuTensorMapEncodeTiled to CudaHostAdapter

Open drisspg opened this issue 1 year ago • 4 comments

trafficstars

Summary

We are currently working on integrating a fp8 scaled matmul kernel written using Cutlass into PyTorch. PyTorch has the constraint that it can be linked against the cuda driver api. There is one symbol/direct call to a cuda driver api cuTensorMapEncodeTiled that is causing issues.

We have a temporary workaround here: https://github.com/pytorch/pytorch/pull/125204#discussion_r1618787335

There was a suggestion to add this symbol to CudaHostAdapter so as to add one more layer of indirection. This would greatly aid in PyTorch in its utilization of Cutlass.

May 31 '24 18:05 drisspg

@kerrmudgeon

Jun 12 '24 15:06 hwu36

Curious if there is any update here?

Jun 28 '24 00:06 drisspg

3.5.1 will ship with this before end of week

Jul 09 '24 16:07 thakkarV

@drisspg please verify and close? I think this is done, but follow up work is required in https://github.com/NVIDIA/cutlass/issues/1624?

Jul 30 '24 15:07 thakkarV

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Aug 29 '24 16:08 github-actions[bot]

Closing as done with #1700

Aug 29 '24 16:08 thakkarV

cutlass cutlass copied to clipboard

[FEA] Add cuTensorMapEncodeTiled to CudaHostAdapter

Summary

cutlass
cutlass copied to clipboard