mesh
mesh copied to clipboard
Mesh TensorFlow: Model Parallelism Made Easier
Thank you for your great work, Here I'm curious about MOE-Transformer's static graph construction. > Q: When there is 1024 experts, switch gating method is used, you need to build...
Hi, I want to use 'L-BFGS' optimizer available in tf.contrib.opt.ScipyOptimizerInterface (or tfp.optimizer.lbfgs_minimize) with Mesh Tensorflow. Is there any direct way I can use it ?
Let's say I have following mesh structure. flags.DEFINE_string('mesh_shape', 'rows:3, columns:4', 'mesh shape') flags.DEFINE_integer('image_nx_block', 3, 'The number of x blocks.') flags.DEFINE_integer('image_ny_block', 4, 'The number of y blocks.') flags.DEFINE_string('layout', 'image_nx_block:rows, image_ny_block:columns', 'layout...
Add original AI2 version of c4 v3.0.1, ND3 deduplicated with param = 0.8, and LM1B, Wiki40B, and lm_first_len512 versions of original AI2 C4 and ND3 deduped AI2 C4 for evaluation.
My objective is to take a generic NN architecture and feed it to Mesh. Since the Mesh API has support for lowering the graph to TensorFlow by using mtf.lowering, I...
Add `cast` preprocessor and add tasks for inference prompts for deduplication project.
We tried to run Mesh-TensorFlow to train T5 on GPUs following the instructions on T5's repository, but the training is extremely slow. > global_step/sec: 0.0467347 > examples/sec: 0.186939 The training...
The following two comments seem wrong, they need to be switched. https://github.com/tensorflow/mesh/blob/4e07d5e7186626dbc56f5a6d63c5dc259f9eb9d8/mesh_tensorflow/transformer/moe.py#L423 https://github.com/tensorflow/mesh/blob/4e07d5e7186626dbc56f5a6d63c5dc259f9eb9d8/mesh_tensorflow/transformer/moe.py#L434
Allow init_from_checkpoint to accept a list of pairs, so as to enable initialization of multiple variables in the graph from the same variable in the checkpoint.
Allow for disabling the automatic save on shutdown. This is bad for mesh-tensorflow, where the variables to be saved haven't gotten updated since the previous checkpoint was written.