Jason Ansel
Jason Ansel
> I agree that the current 2D heuristics are pretty lazy. After the refactor PRs land, I can revisit those and come up with something more sensible, with data to...
> Hi, @jansel , the root cause of the test failures is descripted at issue #136940 , those cases also fails on main branch. Please rebase the PR to a...
@desertfire should be main reviewer on this one
@pytorchbot merge
@steven-johnson I'm implementing a Halide backend for PyTorch/torch.compile/TorchInductor. Early work-in-progress version here: https://github.com/pytorch/pytorch/pull/126417 For this backend I am using the Halide-Python bindings to define a `hl.generator` that generates the kernel...
Is dequantize impure? What is it mutating? IMO this op should be decomposed in inductor. You can register the decomp in the same place the op is defined.
Impure isn't what you are looking for. Impure means the op mutates one of its inputs, so when we functionalize we need to introduce more copies (which might increase memory...
I don't believe we have a dont-constant-fold flag (correct me if I'm wrong @eellison ), though maybe we should.
@pytorchbot rebase Looks like tests are failing
Thanks, I'll switch to using clamp. Is there a way to get halide to generate masked loads (using the hardware mask registers on GPUs)? In some cases many of the...