spatial
spatial copied to clipboard
GEMMLib Stream Version
Currently this is written as:
/** On-chip GEMM */
Stream.Foreach(M par MP, N par NP){(i,j)
val prod = Reduce(Reg[T])(K by 1 par KP){k => getA(i,k) * getB(k,j) }{_+_})
val out = prod.valuealpha + getC(i,j)*beta
storeY(i,j, out)
}
This seems like misuse of the Stream controller as they are now because both of the children stages will run in parallel, since there is no enqueueable memory between the two. It looks like the intention here is to use the Reg as that interface. Should the compiler recognize this and replace the Reg with an enqueueable memory? Or should there be a special RegNew that lets you use it this way? Or should it be rewritten as this, with a local RegNew created to handle the accumulation and the write to the FIFO being an enq:
/** On-chip GEMM */
Stream.Foreach(M par MP, N par NP){(i,j)
val prod = FIFO[T](2)
val prod = Reduce(prod)(K by 1 par KP){k => getA(i,k) * getB(k,j) }{_+_})
val out = prod.deq*alpha + getC(i,j)*beta
storeY(i,j, out)
}
If I remember correctly, the goal here was to squeeze the II of the first stage so that new data can be issued every cycle if this loop is parallelized.