Jason Ansel comments

Results 199 comments of


                                            Jason Ansel

Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide.

Thanks! The yindex/xindex naming is artifact of `torch.compile`'s Triton codegen and how we map to GPU grids there. I'll swap it for Halide output, though that shouldn't matter in this...

Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide.

Makes sense. I am hitting this error somewhat frequently, so a fix would be very helpful! If there is a way to get 64-bit indexing that might also fix it...

[dynamo] Validate check_fn

I'm kind of surprised we didn't find this earlier... won't it just result in a recompile loop until we hit the cache limit?

Don't populate f_locals to check guards

Since guards are always attached to a specific (fixed) code object, `co_varnames`/`co_localsplusnames` is just a constant. Therefor, we could legally generate a guard check like `fast_locals[2] == Py_NONE`, with the...

Error report when run ./examples/rosenbrock/rosenbrock.py

You can specify `--database=` or `args.database` to tell it what filename to use. Looks like it didn't have write access to the default one.

Enable TorchInductor to Generate Matmuls Natively via `tl.dot`

This is very cool and I love the approach. Good work! I think the biggest challenge will be coming up with the right heuristics of when to apply this. There...

Enable TorchInductor to Generate Matmuls Natively via `tl.dot`

I think starting with a simple heuristic makes sense, and perhaps some config to force-enable it. Hopefully we can find a robust heuristic. If you want to try out heuristics,...

[inductor] Cooperative reductions

No, but I agree we need that.

[inductor] Cooperative reductions

I think @bertmaher mentioned someone on his team would add the cooperative launch and `tl.atomic_load(ptr, sem="relaxed")` to Triton. It may make sense to delay turning this on by default until...

[inductor] Cooperative reductions

@pytorchbot merge