cutlass Make runtime assert more clear on CUDA

As stands, when a runtime assert is called on CUDA platforms your program just explodes with no stack trace and no mention of the error that was encountered. I just spent multiple hours debugging an issue where a CUTE_RUNTIME_ASSERT was called because I compiled for sm90 instead of sm90a. If the error message had been printed when CUTE_RUNTIME_ASSERT was called, this would have taken thirty seconds.

My understanding of CUTE_RUNTIME_ASSERT is that it should never be in good code, so even though printf() takes resources it should be fine to include.

Oct 06 '23 00:10 sophiawisdom

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Nov 09 '23 17:11 github-actions[bot]

This PR has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

Feb 07 '24 17:02 github-actions[bot]