Bairen Yi

Results 36 comments of Bairen Yi

@tfboyd Hi Toby, do you know any successful configurations that enable XLA on multiple workers? I kept running into errors like ``Could not colocate node with its resource and reference...

Take a look at the BERT training recipe [here](https://github.com/llvm/torch-mlir/blob/main/build_tools/torchscript_e2e_heavydep_tests/train_models.py#L153-L209). It combines 2 different techniques from PyTorch core: * `torch.nn.utils.stateless.functional_call` to transform `model.forward(inputs)` to `forward(model, inputs)` * `torch.fx.experimental.proxy_tensor.make_fx` to transform a...

AOT Autograd currently doesn't support autocast, and we confirm this through torch->xla lowering. For example a simple conv2d: ### PyTorch module ```python model = torch.nn.Conv2d(16, 33, 3, stride=2) compiled_module =...

IMHO there is 2 possible fixes: 1. return weights with casted dtypes in forward, and feed them directly to backward 2. fix autocast for backward ops in core, possibly having...

> @tmabraham thanks, might take you up on that, currently thinking through the abstracitons, trying to hide most of cuda + distributed config vs xla + distributed config without making...

https://github.com/NVIDIA/tensorflow/commit/9932ec367f0704cacc7340e6110eeaf385124365

> TFRT will support different flavors of kernels: codegen'ed (via xla, mlir, or other technology), hand-written (e.g. via Eigen), and library-based (e.g. by calling into cuDNN). > > codengen'ed kernels...

It’s not limited to TCP/IP in the sense that the communication protocol could be made pluggable in the distributed runtime.

Thanks! Is there any way to test it out, or it could only be run on your internal server for the moment?