Shan19900305
Shan19900305
1) Using items stored in torch._tensor_classes to check item passed from python side; 2) Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new; 3) Using more general API to get python module...
Support privateUser1 key in RNN op。
Using new autocast APIs with device type name. cc @mcarilli @ptrblck @leslie-fang-intel @jgong5
**In function coalesce_x, shape value is using default Int{} when last shape value equal to constant one. Why need to do this?** ` template CUTE_HOST_DEVICE constexpr auto coalesce_x(Layout const& layout)...
Implementations: 1)Function with stride: https://github.com/NVIDIA/cutlass/blob/main/include/cute/stride.hpp#L102 2)Function without stride: https://github.com/NVIDIA/cutlass/blob/main/include/cute/stride.hpp#L152 function with stride is easy to understand, but without stride, why need to compute like: i = c0 + s0 *...