mesh icon indicating copy to clipboard operation
mesh copied to clipboard

Mesh TensorFlow: Model Parallelism Made Easier

Results 99 mesh issues
Sort by recently updated
recently updated
newest added

Hi, To speed up training on V100 GPUs, I'd like to run mesh tf using mixed precision. While TensorFlow has an easy to use [automatic mixed precision](https://www.tensorflow.org/api_docs/python/tf/train/experimental/enable_mixed_precision_graph_rewrite) feature, it requires...

I would like to debug training/fine-tuning performance of mesh transformer on CPU/GPU. Is it possible to capture performance profile using Tensorboard? If so, is there an example or tutorial that...

When I was running the `mnist.py`, it occurred that in `mnist_dataset.py`, function `download`, `os.remove(zipped_filepath)` couldn't work due to PermissionError. Therefore, changing this code into this might works. ` try: os.remove(zipped_filepath)...

This paper [Low-Rank Bottleneck in Multi-head Attention Models](https://arxiv.org/pdf/2002.07028.pdf) suggests that we could fix the head size and keep hidden size unchanged. Could you support setting `d_k`, `d_q`, `d_v` independently instead...

Could you please set to `False` the default value of `ignore_comments`? https://github.com/tensorflow/mesh/blob/7de6e9bc9e362d082b0d8e4b04be321a25b6f0a6/mesh_tensorflow/transformer/utils.py#L766 I'm using T5 and it took me a while to find out why some of the lines in...

In the toy_model_tpu.py exampe, params['context'] is used to understand device assignments and host placements. Where is its value populated? def model_fn(features, labels, mode, params): ... if FLAGS.use_tpu: ctx = params['context']

Hi, I am using Google T5 library which is based on TensorFlow mesh for training a non-autoregressive model like Bert. The training running without a problem, but both the prediction...

I want to run mnist.py example via mpirun to use devices from different nodes, ¿it is possible actually?

Ran training successfully on TPU v2-8 TPU software version: nightly. Ran this with tensorflow 1.15

I have made changes to the mnist.py in the examples section, as documented in the GitHub I have made the changes to achieve data parallelism and model parallelism. I have...