cutlass
cutlass copied to clipboard
[QST]Question About the Use of MMA In-Flight in SS_WarpSpecialized
I don’t understand why MMA in-flight is used in SS_WarpSpecialized. As shown in the diagram below, I’ve illustrated my understanding with a pipeline diagram. If there is MMA in-flight, then after the computation of one stage is completed, the producer cannot immediately read the next set of values. Instead, it has to wait until the next MMA finishes before reading the current stage’s producer values. Does this waiting make sense?