Jason Ansel
Jason Ansel
Thanks! The yindex/xindex naming is artifact of `torch.compile`'s Triton codegen and how we map to GPU grids there. I'll swap it for Halide output, though that shouldn't matter in this...
Makes sense. I am hitting this error somewhat frequently, so a fix would be very helpful! If there is a way to get 64-bit indexing that might also fix it...
I'm kind of surprised we didn't find this earlier... won't it just result in a recompile loop until we hit the cache limit?
Since guards are always attached to a specific (fixed) code object, `co_varnames`/`co_localsplusnames` is just a constant. Therefor, we could legally generate a guard check like `fast_locals[2] == Py_NONE`, with the...
You can specify `--database=` or `args.database` to tell it what filename to use. Looks like it didn't have write access to the default one.
This is very cool and I love the approach. Good work! I think the biggest challenge will be coming up with the right heuristics of when to apply this. There...
I think starting with a simple heuristic makes sense, and perhaps some config to force-enable it. Hopefully we can find a robust heuristic. If you want to try out heuristics,...
No, but I agree we need that.
I think @bertmaher mentioned someone on his team would add the cooperative launch and `tl.atomic_load(ptr, sem="relaxed")` to Triton. It may make sense to delay turning this on by default until...
@pytorchbot merge