spatial icon indicating copy to clipboard operation
spatial copied to clipboard

GEMMLib Stream Version

Open mattfel1 opened this issue 6 years ago • 0 comments

Currently this is written as:

      /** On-chip GEMM */
      Stream.Foreach(M par MP, N par NP){(i,j)
        val prod = Reduce(Reg[T])(K by 1 par KP){k => getA(i,k) * getB(k,j) }{_+_})
        val out = prod.valuealpha + getC(i,j)*beta
        storeY(i,j, out)
      }

This seems like misuse of the Stream controller as they are now because both of the children stages will run in parallel, since there is no enqueueable memory between the two. It looks like the intention here is to use the Reg as that interface. Should the compiler recognize this and replace the Reg with an enqueueable memory? Or should there be a special RegNew that lets you use it this way? Or should it be rewritten as this, with a local RegNew created to handle the accumulation and the write to the FIFO being an enq:

      /** On-chip GEMM */
      Stream.Foreach(M par MP, N par NP){(i,j)
        val prod = FIFO[T](2)
        val prod = Reduce(prod)(K by 1 par KP){k => getA(i,k) * getB(k,j) }{_+_})
        val out = prod.deq*alpha + getC(i,j)*beta
        storeY(i,j, out)
      }

If I remember correctly, the goal here was to squeeze the II of the first stage so that new data can be issued every cycle if this loop is parallelized.

mattfel1 avatar Jun 21 '18 21:06 mattfel1