Jeffrey Wan
Jeffrey Wan
@pytorchbot merge -f "Preexisting failures"
@pytorchbot merge -g
@pytorchbot merge -f "Preexisting failures"
Hmm you mean GradWrapper right?
Its not forward AD specific. There's logic in ADInplaceOrView to check for the tensor's support_as_strided method, so this would apply to all views.
> Is the idea that if we downcast a tensor in the forward, then we need to make sure to upcast back to its original dtype in the backward? (e.g....
Not sure how the code is still working with `no_grad` actually, because the output of `to` would return a tensor that doesn't require grad. The cast can happen implicitly during...
> I think the root cause is that p.data = p.data.to(torch.bfloat16) in PyTorch core doesn't update p's metadata to have bfloat16 dtype, leaving a discrepancy between the metadata dtype and...
Yeah I'm not too sure what is going on - likely XLA specific? I reproed this on Colab (1.11), but from adding the following check it looks like the input_metadata...
Hmm, would it be possible to wrap the functions passed in so that they can be invoked with Function()() but actually call apply inside? ``` def wrapper(fn): def out(*inputs): return...