Jason Ansel comments

Results 199 comments of


                                            Jason Ansel

[dynamo] Refactor into torch/_inductor/runtime/compile_tasks.py

@pytorchbot merge

[dynamo] Refactor into torch/_inductor/runtime/compile_tasks.py

@pytorchbot merge

Incorrect result from first-run autotune function

Ah I think you are correct. The autotuner runs the kernel multiple times, which is not safe to do for in-place kernels because the first run clobbers the input data....

triton.language.log does not support float64

`tl.log()` still throws the same `LLVM ERROR: Broken function found, compilation aborted!` given a float64 tensor. This is annoying, but has the easy workaround of changing it to `tl.libdevice.log`. With...

triton.language.log does not support float64

I am fine with using fast hardware approximations by default. So I'd even suggest defaulting `use_fast_math=True` (perhaps you meant to write that in your example?). A lot of the time...

map::at error when adding [XBLOCK, RBLOCK] size of data with [XBLOCK, 1] size of data

@ptillet this may be related to #574 as they are both issues with 1xN or Nx1 blocks. This issue is forcing us to generate worse/slower code (turning Nx1 blocks into...

map::at error when adding [XBLOCK, RBLOCK] size of data with [XBLOCK, 1] size of data

> Yes, this was actually a very minor issue. I have it fixed, but will merge along with the atomic_add and the rand constexpr fix tonight probably :) Awesome! Thanks...

map::at error when adding [XBLOCK, RBLOCK] size of data with [XBLOCK, 1] size of data

I believe so, but I'll let @pyjhzwh confirm. This one is pretty awkward to workaround, because there are few different ways to write a broadcasting load: 1) reshape the index...

Fail to compile with gloo: multiple definition of `gloo::rendezvous::Store::kDefaultTimeout

I'm hitting the same thing on cuda 11.6 with latest master, so not cuda 11.4 specific. Clean build doesn't seem to help.

Refactor inductor to use standard BACKENDS dict

@pytorchbot merge