Bairen Yi comments

Results 36 comments of


                                            Bairen Yi

Total loss becomes Nan while using XLA

@tfboyd Hi Toby, do you know any successful configurations that enable XLA on multiple workers? I kept running into errors like ``Could not colocate node with its resource and reference...

Can torchscript dump backward graph?

Take a look at the BERT training recipe [here](https://github.com/llvm/torch-mlir/blob/main/build_tools/torchscript_e2e_heavydep_tests/train_models.py#L153-L209). It combines 2 different techniques from PyTorch core: * `torch.nn.utils.stateless.functional_call` to transform `model.forward(inputs)` to `forward(model, inputs)` * `torch.fx.experimental.proxy_tensor.make_fx` to transform a...

Bairen Yi

Total loss becomes Nan while using XLA

Can torchscript dump backward graph?

Check how functorch interacts with autocast

Check how functorch interacts with autocast

AWS with Intel 82599, ixgbevf?

[FEATURE] TPU training / validation support and train / val code refactor.

Support overlapping NCCL collective communication with compute on GPU

All registered kernel functions are c++ function now, how to register MLIR compiled function?

distributed runtime

Added jenkinsfile