Ivan Komarov
Ivan Komarov
Sure! I used the following setup when I was testing this: 1. A server with a public IPv4 address (e.g., `1.2.3.4`) to host `fake_{dns,git}.py` on. 1. A domain whose DNS...
> force-pushed Changed the author of the commit, no functional changes intended.
@hwu36 Hi, could you please take a quick look at this PR?
@jackkosaian > Is that a possibility for your application? That would imply a performance hit for two different reasons: 1. When `k=0` (as opposed to `m=0` or `n=0`), you still...
I was a little unclear here: in the `grouped_gemm` library, cuBLAS is only used as a fallback that launches multiple regular GEMMs instead of a single grouped GEMM. In this...
> Otherwise, please respond with a comment indicating any updates. Well, I still think this is something worth fixing in CUTLASS directly. For now, I implemented a [workaround](https://github.com/tgale96/grouped_gemm/pull/14/files#diff-bffec0006eb97fc116595ecdecac2dbf848af93a6d61247922a60e467c0fb615R153) in `grouped_gemm`,...
@mnicely Thank you, this sounds great! A couple of follow-up questions: 1. I am little confused as to how cuBLAS can resolve it if they appear to be using a...
> Otherwise, please respond with a comment indicating any updates An eventual fix for this on the CUTLASS side would be very appreciated, but I understand that this is low-priority...
> an alternative approach that would make use of the cache is doing something like `let level = Ord::min(level, 3)` I just tried that, but it appears to be trickier...