Jeffrey Wan

Results 36 comments of Jeffrey Wan

@pytorchbot merge -f "Preexisting failures"

@pytorchbot merge -f "Preexisting failures"

Its not forward AD specific. There's logic in ADInplaceOrView to check for the tensor's support_as_strided method, so this would apply to all views.

> Is the idea that if we downcast a tensor in the forward, then we need to make sure to upcast back to its original dtype in the backward? (e.g....

Not sure how the code is still working with `no_grad` actually, because the output of `to` would return a tensor that doesn't require grad. The cast can happen implicitly during...

> I think the root cause is that p.data = p.data.to(torch.bfloat16) in PyTorch core doesn't update p's metadata to have bfloat16 dtype, leaving a discrepancy between the metadata dtype and...

Yeah I'm not too sure what is going on - likely XLA specific? I reproed this on Colab (1.11), but from adding the following check it looks like the input_metadata...

Hmm, would it be possible to wrap the functions passed in so that they can be invoked with Function()() but actually call apply inside? ``` def wrapper(fn): def out(*inputs): return...