Jeffrey Wan comments

Results 36 comments of


                                            Jeffrey Wan

Fix and unskip dataloader tests on ARM

@pytorchbot merge -f "Preexisting failures"

Fix and unskip cpp extension tests for ARM

@pytorchbot merge -g

Fix and unskip cpp extension tests for ARM

@pytorchbot merge -f "Preexisting failures"

vmap and forward-mode AD fail sometimes on in-place views

Hmm you mean GradWrapper right?

vmap and forward-mode AD fail sometimes on in-place views

Its not forward AD specific. There's logic in ADInplaceOrView to check for the tensor's support_as_strided method, so this would apply to all views.

Cast liner model to bf16 produces unexpected f32

> Is the idea that if we downcast a tensor in the forward, then we need to make sure to upcast back to its original dtype in the backward? (e.g....

Cast liner model to bf16 produces unexpected f32

Not sure how the code is still working with `no_grad` actually, because the output of `to` would return a tensor that doesn't require grad. The cast can happen implicitly during...

Cast liner model to bf16 produces unexpected f32

> I think the root cause is that p.data = p.data.to(torch.bfloat16) in PyTorch core doesn't update p's metadata to have bfloat16 dtype, leaving a discrepancy between the metadata dtype and...

Cast liner model to bf16 produces unexpected f32

Yeah I'm not too sure what is going on - likely XLA specific? I reproed this on Colab (1.11), but from adding the following check it looks like the input_metadata...

`torch.autograd.gradcheck.get_numerical_jacobian` in `utils.gradcheck` will be deprecated in PT 1.9

Hmm, would it be possible to wrap the functions passed in so that they can be invoked with Function()() but actually call apply inside? ``` def wrapper(fn): def out(*inputs): return...