Assi Jakoby

Results 3 issues of Assi Jakoby

### Description When running t2t-decoder script ( En-De transformer-big) on a model which was trained on 8 GPUs using DistributedMirrorStrategy. I get the following error ValueError: Tensor("body/parallel_0/body/decoder/layer_0/self_attention/multihead_attention/dot_product_attention/attention:0", shape=(), dtype=string, device=/device:GPU:0)...

### Description When working with t2t 1.15.7 on tensorflow 2.2 and performing training on 1 GPU the model weights are ~211M, but when we increase the # of GPUs the...

Hi, Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...