peterbell10

Results 134 comments of peterbell10

CUDA builds don't require any additional work for the main changes here. `.cu` files can use the per-operator headers as they would any other header and `TensorBase` was used first...

> you can change `-abs(a-b)` to be `min - max` I think the former is better actually, since `min`/`max` has 4 cycles of latency while `abs` and `neg` are bitwise...

@pytorchbot merge -f "CI failure is present on master"

@lezcano I did see the distribution uses but not sure what our reproducibility guarantees are, so didn't want to touch it. For the logit functions, I tried `log(x) - log1p(-x)`...

There is still a caffe2 job in PyTorch trunk CI so I think it still has value.

@pytorchbot merge -f "Dynamo failure is pre-existing on master"