mesh
mesh copied to clipboard
Mesh TensorFlow: Model Parallelism Made Easier
@nikip: this is resolved, right?
mtf.expand_dims can be implemented in terms of stack mtf.squeeze can be implemented in terms of reduce_sum + some sanity checks
Dear authors, I have read the code of auto-mesh. I found that when calculating the memory consumption given a schedule, it only included the consumption by the forward phase, but...
Is it possible to incorporate [MultiworkerMirroredStrategy](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) into Mesh TF? I would like to run model + data parallelism on a supercomputer that has multiple GPUs on multiple nodes. It seems...
Hi there, Thanks for creating this framework. I was trying to run the transformer example provided in the README.md and I realized some files are missing in the repository. Could...
[Travis are now recommending removing the __sudo__ tag](https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration). "_If you currently specify __sudo: false__ in your __.travis.yml__, we recommend removing that configuration_"
Hi, does MTF support overlapped meshes? For example, for a NN model with 6 layers, I want to parallelize three first layers with 1d mesh and three remaining with 2d...
currently broadcasting semantics aren't the same as regular tensorflow
The [mtf_transformer](https://github.com/tensorflow/tensor2tensor/blob/v1.9.0/tensor2tensor/mesh_tensorflow/mtf_transformer.py#L749) in Tensor2Tensor defaults to a mesh configuration for TPUs that uses 32 cores or 4 Cloud TPUs. I wasn't able to find documentation on utilizing more than a...