cutlass Mixed Precision Grouped Gemm with zero points and GPT-Q semantics closes #2261

Hello! This MR provides two things:

Zero points for default mode
GPT-Q semantics

Closes #2261

Jul 11 '25 10:07 ankutalev

sorry, running a bit behind. we will get to it soon.

Jul 14 '25 20:07 thakkarV

@ankutalev Thanks for submitting this feature MR. Have you checked the functionality of this feature? Could you post the result of running this feature (example 69) here?

Jul 15 '25 01:07 Junkai-Wu

@ankutalev Thanks for submitting this feature MR. Have you checked the functionality of this feature? Could you post the result of running this feature (example 69) here?

Yes, I checked - it shows "Disposition Passed" for all scenarios ({shuffled/unshuffled} X {direct convert, no zeros, zeros, gptq}). Which is not good also - because new gptq semantics dequantizes matrix in different way; the test in examples is weak.

I can provide unit tests if you like.

Also I don't like the way I implemented gptq mode switch, but runtime parameters seems like "not cutlass style"; I will apreciate any advices and suggestions here =)

We are interested in this functionality in main branch, because nobody likes to have patched forks =)

Jul 15 '25 05:07 ankutalev

@Junkai-Wu Hi! any uppdates here?

Jul 22 '25 10:07 ankutalev

@ankutalev we are reviewing the changes internally. Will merge this PR once got approved and merged in our internal repo.

Jul 24 '25 02:07 Junkai-Wu

@ankutalev we are reviewing the changes internally. Will merge this PR once got approved and merged in our internal repo.

Hi! Any news here?

Aug 14 '25 14:08 ankutalev

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sep 13 '25 15:09 github-actions[bot]

@ankutalev we are reviewing the changes internally. Will merge this PR once got approved and merged in our internal repo.

Gentle ping

Oct 29 '25 09:10 ankutalev