peterbell10
peterbell10
@isuruf looks like this will need a manual rebase. The triton pin is updated now so this should be able to run CI now.
We can just add a fallback to the lowering for cases when `torch.version.hip is not None`.
Still need the `torch.version.hip is not None` gate for `cummax` and `cummin` since `triton-rocm` doesn't have the latest `tl.scan` support.
@pytorchbot merge
@pytorchbot merge -i
> I was wondering whether we do need the mask, or we can define it to be as `and(-size
Test failures look like you just need to bump the tolerance for half precision in `test_torchinductor_opinfo.py`
> We always handle negative numbers properly when doing indexing. It's not a question of handling negative numbers, it's a question of loading only the data that the kernel is...
Well in the top of this PR stack for example we have `torch.logical_and(idx >= -padding[i], idx < dhw[i] + padding[i])` which is an example of a more complicated mask and...
Hrm looks like meta dispatch isn't available on mobile: https://github.com/pytorch/pytorch/blob/70ad64e8a644d478228aa740e03f65d6153c4074/cmake/Codegen.cmake#L87 I guess you can just wrap references to it in `#ifndef C10_MOBILE` and otherwise raise a runtime error if it's...