Manish Gupta

Results 30 comments of Manish Gupta

> to be symmetric w.r.t. math_instructions and tile_descriptions. What do you mean by symmetric (same?). Tensor Core math_instruction shape for both upcast_a and upcast_b is 16816. The supported tile_description (more...

For `math_instructions` makes sense. Yes, we should have the support for combinations you listed. Once you add those, please ensure the [references](https://github.com/NVIDIA/cutlass/blob/main/tools/library/src/reference/gemm_fp_mixed_input.cu) for the same are also in place, run...

1. For your mixed-input case, add a device-level unit test. Track similar unit test from [here](https://github.com/alexsamardzic/cutlass/blob/bf3d57158440a6ea00ea17b139ba7df8243b7acd/test/unit/gemm/device/CMakeLists.txt#L247). 2. You should also test if the profiler is working with verification for your...

Thank you for the change. Overall looks good. Can do the following? 1. CUTLASS Profiler Output for All the mixed input GEMMs ``` build $ cmake ../cutlass/ -DCUTLASS_NVCC_ARCHS="90a" -DCUTLASS_ENABLE_F16C=ON -DCMAKE_BUILD_TYPE=Release...

> @manishucsd , do you mean this pr or 1190? i did all the testing for this pr myself. I meant this one. However, this one is adding only unit...

> > ``` > > 2. I do see some unverified rows in the output of cutlass profiler for mixed-input runs. > > ``` > > I've built and ran...

LGTM. @hwu36 and CUTLASS team can you please merge this? cc: @alexsamardzic

Checking the status on this reviewed PR. If this is already merged?

@ZelboK , You can compile and run only align8 kernels for this shape. Use string "cutlass_tensorop_h16816dgrad_optimized*align8" for cmake and running the cultass_profiler. The results in [comparison_hgrad.csv](https://github.com/NVIDIA/cutlass/files/14878462/comparison_hgrad.csv) are with both loads...

Thanks @ZelboK for the work on this and analysis. The `hgrad.csv` present one problem size running with different tile configurations. Looking at the data in `hgrad.csv` the FastDivMod refactoring in...