Assi Jakoby
Assi Jakoby
### Description When running t2t-decoder script ( En-De transformer-big) on a model which was trained on 8 GPUs using DistributedMirrorStrategy. I get the following error ValueError: Tensor("body/parallel_0/body/decoder/layer_0/self_attention/multihead_attention/dot_product_attention/attention:0", shape=(), dtype=string, device=/device:GPU:0)...
### Description When working with t2t 1.15.7 on tensorflow 2.2 and performing training on 1 GPU the model weights are ~211M, but when we increase the # of GPUs the...
Hi, Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...