mesh
mesh copied to clipboard
Mesh TensorFlow: Model Parallelism Made Easier
Can someone please explain the difference between "predict" vs "eval" functionality? Would they result in exactly same predictions if the input model checkpoint is fixed? > predict: https://github.com/google-research/text-to-text-transfer-transformer/blob/a92261fc9a5d5461c8d6542bbea1fe8d1e4d878e/t5/models/mtf_model.py#L330 > eval:...
Remove unused vocab argument from dataset function calls.
This shouldn't have public changes once the diffbase is submitted.
Few doc fixes. let me know if they are worth tracking them and reporting.
I'm trying to fine-tuning a released T5 checkpoint in float32, but I get the following error: 2020-09-03 16:33:42.380962: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Invalid argument: tensor_name = /block_018/layer_002/layer_norm/scale;...
When packing is done here https://github.com/tensorflow/mesh/blob/6a812c8bb847e081e976533ed497c7c5016bb1ec/mesh_tensorflow/transformer/dataset.py Each packed sequence has multiple examples ("segments"). I'm trying to figure out where do you prevent information to leak between these examples (e.g in...
The documentation of `mtf.while_loop` implies that control_dependencies doesn't work on TPUs. This was true early in the lifecycle of TPU development, but TPUs respect control_dependencies now. I emailed TFRC to...
My understanding from the readme is that there is some flexibility in the TPU Mesh, but all operations must replicated on all TPU cores. Will there ever be support for...
1. I have an image classification model defined in Keras that I'm attempting to parallelize with MTF. However, it's not clear to me whether MTF support exists for keras.layers/tf.layers or...