cutlass [QST] BatchGemm: how to add bias(vector) to chw row major output

May 20 '24 09:05 Jellards

@d-k-b

May 20 '24 11:05 thakkarV

Can you provide some details about the library/application you are linking cutlass with? Does it include CUDA language support already, for instance? Does it compile other .cu files successfully?

May 20 '24 16:05 d-k-b

typename GemmBatchedCuda_F32_F32_Linear_AlignCuda_Row_Row_Sm86::Arguments arguments{ problem_size, {(float *)weight->DevicePtr(), C}, (int64_t)(0), {(float *)input->DevicePtr(), H * W}, (int64_t)(C * H * W), {(float *)bias->DevicePtr(), H * W}, (int64_t)(0), {(float *)output->DevicePtr(), H * W}, (int64_t)(N * H * W), {1.f}, B};

weight（layout NC rowmajor）,input(layout CHW rowmajor), output(layout NHW rowmajor)，bias(layout N rowmajor) weight x input = NHW rowmajor, then bias vector length must the same as HW, not N. How can i add bias N vector to NHW rowmajor matirx. Maybe change to input(CHW columnmajor) x weight(NC columnmajor) = HWN columnmajor, then add bias N vector can work. But I don't want to do so many transpose operators

@thakkarV

May 22 '24 09:05 Jellards