mesh
mesh copied to clipboard
Mesh TensorFlow: Model Parallelism Made Easier
I am interested in getting mesh tensorflow to work with OpenNMT. I would like to do model parallelism on a multi-node GPU cluster. Is this possible? Thanks.
`mtf.dropout(x, 0.1)` means dropout with 90% probability. `tf.dropout(x, 0.1)` means dropout with 10% probability. For around a month, this has caused an agonizing bug with a GPT project that was...
I read the paper, Switch Transformers, as carefully as possible. However, none of these parameters were glossarized and well-defined in the [code](https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/moe.py) and paper. For example, you have the following...
You should also specify `total_train_steps` here: https://github.com/tensorflow/mesh/blob/d91460615e32cf13077f94a868a8324f63fe758e/mesh_tensorflow/transformer/utils.py#L672-L676
I am currently try to implement returning logits along with prediction from `sample_autoregressive` to calculate score from them. However the score calculated from these logits are slightly different from the...
**Background:** I was using the T5 model and wanted to get the scores at inference mode along with the generated text. However, this feature is not supported by T5 at...
Hi, Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...
when running transformer, bias is not existed in selfAttention. mesh_tensorflow/bert has bias in selfAttention. what's the meaning of relative_attention_type transformer_layer.SelfAttention? how could I get the bias in transformer_layer.SelfAttention?
In this [example](https://github.com/tensorflow/mesh#how-do-i-pick-a-layout), we can see how to set the layout to be automatically picked. However, when using this in my `model_fn`, basically replacing [this line](https://github.com/tensorflow/mesh/blob/master/examples/mnist.py#L123), I find myself with...
This PR adds spectral operations needed for the [flowpm](https://github.com/modichirag/flowpm/tree/mesh) project in Mesh TensorFlow, which was the subject of this blogpost: https://blog.tensorflow.org/2020/03/simulating-universe-in-tensorflow.html . These operations are useful for lots of applications...