jeffhataws comments

Results 63 comments of


                                            jeffhataws

Extracted subarrray's device is 'lazy' instead of 'xla' when using ellipsis extraction with XLA_DISABLE_FUNCTIONALIZATION=1

Hi @wonjoolee95, I just want to follow up on this issue to see if there's a fix.

2.5 backport PR request list

PR: https://github.com/pytorch/xla/pull/8094 Reason: Fix for autocast to enable cross entropy loss with FP32 precision [Done] Cherry-pick: https://github.com/pytorch/xla/pull/8201

2.5 backport PR request list

PR: https://github.com/pytorch/xla/pull/8204 Reason: Multi-node SPMD support for Neuron Cherry-pick: https://github.com/pytorch/xla/pull/8224

Can't print XLA tensors or call `cpu()`.

It seems the problem occurs when DEBUG=1.

[torch-xla 2.1 - 2.4] when functionalization is on, there are no aliasing for gradients when using gradient accumulation

Just want to document some findings using the original MLP test using just CPU, printing ``met.metric_data("InputOutputAliasCount")``: ``` pt2.1 with functionalization on CPU: (4, 42.0, ((1719204665.609187, 16.0), (1719204667.7811365, 2.0), (1719204668.486205, 12.0),...

[torch-xla 2.1 - 2.4] when functionalization is on, there are no aliasing for gradients when using gradient accumulation

Minimal reproduction with only 1 linear layer and only gradient accumulation: ``` import os import torch import torch.nn as nn import torch.nn.functional as F import torch_xla.core.xla_model as xm import torch_xla.debug.metrics...

[torch-xla 2.1 - 2.4] when functionalization is on, there are no aliasing for gradients when using gradient accumulation

Using TOT, I modified torch_xla/csrc/xla_graph_executor.cpp to dump some info: ``` diff --git a/torch_xla/csrc/xla_graph_executor.cpp b/torch_xla/csrc/xla_graph_executor.cpp index 74c3270a9..d31924fb4 100644 --- a/torch_xla/csrc/xla_graph_executor.cpp +++ b/torch_xla/csrc/xla_graph_executor.cpp @@ -1264,6 +1264,7 @@ std::vector XLAGraphExecutor::SetBufferDonors( size_t tensor_index =...