mesh icon indicating copy to clipboard operation
mesh copied to clipboard

Mesh TensorFlow: Model Parallelism Made Easier

Results 99 mesh issues
Sort by recently updated
recently updated
newest added

I am interested in getting mesh tensorflow to work with OpenNMT. I would like to do model parallelism on a multi-node GPU cluster. Is this possible? Thanks.

`mtf.dropout(x, 0.1)` means dropout with 90% probability. `tf.dropout(x, 0.1)` means dropout with 10% probability. For around a month, this has caused an agonizing bug with a GPT project that was...

I read the paper, Switch Transformers, as carefully as possible. However, none of these parameters were glossarized and well-defined in the [code](https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/moe.py) and paper. For example, you have the following...

You should also specify `total_train_steps` here: https://github.com/tensorflow/mesh/blob/d91460615e32cf13077f94a868a8324f63fe758e/mesh_tensorflow/transformer/utils.py#L672-L676

I am currently try to implement returning logits along with prediction from `sample_autoregressive` to calculate score from them. However the score calculated from these logits are slightly different from the...

**Background:** I was using the T5 model and wanted to get the scores at inference mode along with the generated text. However, this feature is not supported by T5 at...

cla: yes

Hi, Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...

when running transformer, bias is not existed in selfAttention. mesh_tensorflow/bert has bias in selfAttention. what's the meaning of relative_attention_type transformer_layer.SelfAttention? how could I get the bias in transformer_layer.SelfAttention?

In this [example](https://github.com/tensorflow/mesh#how-do-i-pick-a-layout), we can see how to set the layout to be automatically picked. However, when using this in my `model_fn`, basically replacing [this line](https://github.com/tensorflow/mesh/blob/master/examples/mnist.py#L123), I find myself with...

This PR adds spectral operations needed for the [flowpm](https://github.com/modichirag/flowpm/tree/mesh) project in Mesh TensorFlow, which was the subject of this blogpost: https://blog.tensorflow.org/2020/03/simulating-universe-in-tensorflow.html . These operations are useful for lots of applications...

cla: yes