peterbell10 comments

Results 134 comments of


                                            peterbell10

Add aten._unsafe_masked_index

Given this is a pre-existing issue in the C++ backend I think it's fine to open an issue for the Intel folks to look at and just skip the test...

triton generates unnecessary shared memory stores/loads

> Which version of pytorch are you on? I tried to run your code, but failed. AttributeError: type object 'torch._C.Generator' has no attribute 'graphsafe_set_state' Given that `graphsafe_set_state` doesn't appear in...

test_sum can fail with unstable summation

In PyTorch we use a cascading summation which is much more stable than the naive summation. IIRC NumPy uses the exact same algorithm in fact. The only difference is that...

[inductor] Add abs to index_propagation

@pytorchbot merge

[inductor] Add abs to index_propagation

@pytorchbot merge

[inductor] Add abs to index_propagation

@pytorchbot merge -r

[inductor] Improve stability of scaled softmax

> Curious - do you think the right place to apply this is at the graph level or perhaps lower inside Inductor? I think this has to be done at...

[inductor] Improve stability of scaled softmax

> What happens to the other PRs after this one? Are still relevant? Do we still want to disable fma at large? This is a tough question. On the stack...

[inductor] Improve stability of scaled softmax

@pytorchbot merge

[inductor] Improve stability of scaled softmax

> Why did the last commit fixes a perf regression? Basically `where(other >= 0, inp, -inp)` was being saved for backwards, which resulted in additional buffers and writes, even if...