Yanming W. comments

Results 21 comments of


                                            Yanming W.

Single Node Distributed Training

@codeislife99 FYI previous thread about the same issue https://github.com/pytorch/xla/issues/3347. It looks like pytorch recently requires `CUDA_VISIBLE_DEVICES` to be set prior to importing torch.

Single Node Distributed Training

@codeislife99 It looks like this is a bug in pytorch 1.12 and has been fixed now https://github.com/pytorch/pytorch/issues/80876. We need to figure out how to not rely on `CUDA_VISIBLE_DEVICES` to set...

[Bug] XlaBuilder is already registered when working with HuggingFace Trainer

Maybe some environment issues? I couldn't reproduce it in the CI docker image `gcr.io/tpu-pytorch/xla_base:latest-d8db50a778a39fab0a58436307a3225a6ca06f67.` with `pip install git@https://github.com/huggingface/transformers@06a6a4b`.

[GPU] XLA post-processing error out

It looks like tensorflow grappler has some trouble creating clusters and the error is coming from [here](https://github.com/tensorflow/tensorflow/blob/e9db4aec6714173c1e556b701feda06cc5203380/tensorflow/core/grappler/clusters/virtual_cluster.cc#L50). This step should happen during compilation, maybe xla can skip these optimizations if...

[DETR support] Lower aten::_cdist_forward

I think this op may not need to be implemented using XLA custom call. I found it can be lowered using broadcast + diff + reduce. You can check out...

[TF pin update] The same RunGraph (Worker) request was received twice

To fix this error, I simply reverted commit 0de6ecda97e261528b51709c11a4e7e22a39ca33. I suspect this is a XRT related issue and is not specific to any platforms.

[DETR support] Lower scipy.optimize.linear_sum_assignment

I've created a prototype using xla custom_call and scipy.optimize.linear_sum_assignment C++ api https://github.com/ymwangg/xla/commit/0bba425041ba3e664c682ebb3b430a846949b2ff. This implementation requires data transfer round trips to CPU to do the computation. Ideally we want everything on...

Yanming W.

Single Node Distributed Training

Single Node Distributed Training

[Bug] XlaBuilder is already registered when working with HuggingFace Trainer

[GPU] XLA post-processing error out

[DETR support] Lower aten::_cdist_forward

[TF pin update] The same RunGraph (Worker) request was received twice

[DETR support] Lower scipy.optimize.linear_sum_assignment

Parallel compile on GPU backend would slow down execution performance.

Cannot Build fx2ait with setup.py

Add support for small page sizes