mesh
mesh copied to clipboard
Communication Between TPU Cores and Encoder->Reduce->Decoder Pattern
My understanding from the readme is that there is some flexibility in the TPU Mesh, but all operations must replicated on all TPU cores.
Will there ever be support for reducing an encoder split across 8 cores to run a decoder on a single core?
Effectively, the graph would take an input of (cores * bs, other shapes) and the output would simply be (1, other shapes). A example usage would be encoding a set of tweets and outputting a single summary.