Aleksandar Samardžić

Results 60 comments of Aleksandar Samardžić

Re-based on the latest main and made several updates, most important of which is that `cutlass::gemm::threadblock::MmaMultistage` class is changed to support 4-bit mixed data-types GEMM. A new test `gemm_universal_f16t_s4n_f16t_mixed_input_tensor_op_f16_sm80.cu` is...

Hi guys, would it be possible for you to provide any feedback on this PR at this stage? This functionality is really needed for PyTorch, and I recently updated the...

By symmetry, I meant on `math_instructions` list within given generator methods: I was thinking that, if `GenerateSM80_SparseTensorOp_16832` method has for example `DataType.f16, DataType.f16, DataType.f32` combination listed there, then `upcast_a` method...

Thanks for the clarification. I've updated `gemm_fp_mixed_input.cu` in my PR. W.r.t. verification - is there an "official" way to do it? I've checked that, on A100, whenever for example there...

Asking again: how to properly run verification after my changes?

Thanks for the clarifications. PR is updated with the changes suggested: Added number of tests, so that it should be all consistent now between tests, [`generator.py`](https://github.com/NVIDIA/cutlass/blob/main/python/cutlass_library/generator.py) and [`gemm_fp_mixed_input.cu`](https://github.com/NVIDIA/cutlass/blob/main/tools/library/src/reference/gemm_fp_mixed_input.cu). Also fixed...

@hwu36: Thanks for the test fix! The problem with the configurations added in your commit is that they won't work - one could try to change for example exactly the...

> I did not change unit test. I meant on fixing the test name :-) > The reason that profiler cannot do 128x32 is due to epilogue alignment. I fixed...

> 2. I do see some unverified rows in the output of cutlass profiler for mixed-input runs. I've built and ran the profiler according to instructions you provided above, and...

> i will merge this pr after 3.5.1 pr is merged. > > @manishucsd is reviewing pr1190. that one changed mainloop, it will take a while. @manishucsd: Maybe you could...