Manish Gupta comments

Results 30 comments of


                                            Manish Gupta

Add couple configs into generator.py for mixed input MM

> to be symmetric w.r.t. math_instructions and tile_descriptions. What do you mean by symmetric (same?). Tensor Core math_instruction shape for both upcast_a and upcast_b is 16816. The supported tile_description (more...

Add couple configs into generator.py for mixed input MM

For `math_instructions` makes sense. Yes, we should have the support for combinations you listed. Once you add those, please ensure the [references](https://github.com/NVIDIA/cutlass/blob/main/tools/library/src/reference/gemm_fp_mixed_input.cu) for the same are also in place, run...

Add couple configs into generator.py for mixed input MM

1. For your mixed-input case, add a device-level unit test. Track similar unit test from [here](https://github.com/alexsamardzic/cutlass/blob/bf3d57158440a6ea00ea17b139ba7df8243b7acd/test/unit/gemm/device/CMakeLists.txt#L247). 2. You should also test if the profiler is working with verification for your...

Add couple configs into generator.py for mixed input MM

Thank you for the change. Overall looks good. Can do the following? 1. CUTLASS Profiler Output for All the mixed input GEMMs ``` build $ cmake ../cutlass/ -DCUTLASS_NVCC_ARCHS="90a" -DCUTLASS_ENABLE_F16C=ON -DCMAKE_BUILD_TYPE=Release...

Add couple configs into generator.py for mixed input MM

> @manishucsd , do you mean this pr or 1190? i did all the testing for this pr myself. I meant this one. However, this one is adding only unit...

Add couple configs into generator.py for mixed input MM

> > ``` > > 2. I do see some unverified rows in the output of cutlass profiler for mixed-input runs. > > ``` > > I've built and ran...

Add couple configs into generator.py for mixed input MM

LGTM. @hwu36 and CUTLASS team can you please merge this? cc: @alexsamardzic

Add couple configs into generator.py for mixed input MM

Checking the status on this reviewed PR. If this is already merged?

Refactor to use FastDivmod for predicated strided dgrad iterators.

@ZelboK , You can compile and run only align8 kernels for this shape. Use string "cutlass_tensorop_h16816dgrad_optimized*align8" for cmake and running the cultass_profiler. The results in [comparison_hgrad.csv](https://github.com/NVIDIA/cutlass/files/14878462/comparison_hgrad.csv) are with both loads...

Refactor to use FastDivmod for predicated strided dgrad iterators.

Thanks @ZelboK for the work on this and analysis. The `hgrad.csv` present one problem size running with different tile configurations. Looking at the data in `hgrad.csv` the FastDivMod refactoring in...