[QST] BatchGemm: how to add bias(vector) to chw row major output
@d-k-b
Can you provide some details about the library/application you are linking cutlass with? Does it include CUDA language support already, for instance? Does it compile other .cu files successfully?
typename GemmBatchedCuda_F32_F32_Linear_AlignCuda_Row_Row_Sm86::Arguments arguments{ problem_size, {(float *)weight->DevicePtr(), C}, (int64_t)(0), {(float *)input->DevicePtr(), H * W}, (int64_t)(C * H * W), {(float *)bias->DevicePtr(), H * W}, (int64_t)(0), {(float *)output->DevicePtr(), H * W}, (int64_t)(N * H * W), {1.f}, B};
weight(layout NC rowmajor),input(layout CHW rowmajor), output(layout NHW rowmajor),bias(layout N rowmajor) weight x input = NHW rowmajor, then bias vector length must the same as HW, not N. How can i add bias N vector to NHW rowmajor matirx. Maybe change to input(CHW columnmajor) x weight(NC columnmajor) = HWN columnmajor, then add bias N vector can work. But I don't want to do so many transpose operators
@thakkarV