mesh icon indicating copy to clipboard operation
mesh copied to clipboard

Mesh TensorFlow: Model Parallelism Made Easier

Results 99 mesh issues
Sort by recently updated
recently updated
newest added

Can someone please explain the difference between "predict" vs "eval" functionality? Would they result in exactly same predictions if the input model checkpoint is fixed? > predict: https://github.com/google-research/text-to-text-transfer-transformer/blob/a92261fc9a5d5461c8d6542bbea1fe8d1e4d878e/t5/models/mtf_model.py#L330 > eval:...

Remove unused vocab argument from dataset function calls.

cla: yes

This shouldn't have public changes once the diffbase is submitted.

cla: yes

Few doc fixes. let me know if they are worth tracking them and reporting.

I'm trying to fine-tuning a released T5 checkpoint in float32, but I get the following error: 2020-09-03 16:33:42.380962: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Invalid argument: tensor_name = /block_018/layer_002/layer_norm/scale;...

When packing is done here https://github.com/tensorflow/mesh/blob/6a812c8bb847e081e976533ed497c7c5016bb1ec/mesh_tensorflow/transformer/dataset.py Each packed sequence has multiple examples ("segments"). I'm trying to figure out where do you prevent information to leak between these examples (e.g in...

The documentation of `mtf.while_loop` implies that control_dependencies doesn't work on TPUs. This was true early in the lifecycle of TPU development, but TPUs respect control_dependencies now. I emailed TFRC to...

My understanding from the readme is that there is some flexibility in the TPU Mesh, but all operations must replicated on all TPU cores. Will there ever be support for...

1. I have an image classification model defined in Keras that I'm attempting to parallelize with MTF. However, it's not clear to me whether MTF support exists for keras.layers/tf.layers or...