blis icon indicating copy to clipboard operation
blis copied to clipboard

Feature: gemmd product with 4th loop parallelization

Open JerryMaoQC opened this issue 4 years ago • 5 comments

Summary:

  • Implement the gemmd product, which is gemm with a diagonal matrix of "weights" inserted in the middle. Formally, compute A * diag(d) * B for A (mxk), B (kxn), d (k).
  • Enable parallelization on the 4th loop (the PC loop), currently via OpenMP only.

Use case:

  • For matrices with large k and relatively smaller n and m, computing either of the intermediate products Ad or dB is "wasteful", since they both have a dimension of size k. This spends both time to calculate and memory to hold the large result. gemmd can compute the result without the overhead of evaluating the intermediate result.
  • We additionally parallelize the PC loop because this is the only loop over the k dimension. As gemmd is most useful when k is large, parallelizing this loop can have a major impact on performance in this use case.

JerryMaoQC avatar Aug 27 '21 12:08 JerryMaoQC

Sorry for the delay in looking at this, @JerryMaoQC. I do aim to get to it soon.

fgvanzee avatar Aug 31 '21 15:08 fgvanzee

@JerryMaoQC I noticed you chose the sandbox name gemmd. However, your operation is still called gemm (with APIs via bls_gemm(), bls_gemm_ex(), bls_?gemm()). Was this name (and the API names) chosen intentionally? If not, I'd be happy to help you change the filenames and function names.

fgvanzee avatar Sep 02 '21 17:09 fgvanzee

@fgvanzee and @JerryMaoQC what is the impetus to include this in BLIS mainline?

devinamatthews avatar Oct 04 '21 21:10 devinamatthews

@devinamatthews I suggested that @JerryMaoQC could submit it since there is no harm (that I could see) in having the extra sandbox directory there for posterity and in case others want to study and/or build on his work.

fgvanzee avatar Oct 04 '21 22:10 fgvanzee

Sure. No objection I was just curious.

devinamatthews avatar Oct 04 '21 22:10 devinamatthews