peterbell10
peterbell10
CUDA builds don't require any additional work for the main changes here. `.cu` files can use the per-operator headers as they would any other header and `TensorBase` was used first...
> you can change `-abs(a-b)` to be `min - max` I think the former is better actually, since `min`/`max` has 4 cycles of latency while `abs` and `neg` are bitwise...
@pytorchbot merge
@pytorchbot merge -f "CI failure is present on master"
@lezcano I did see the distribution uses but not sure what our reproducibility guarantees are, so didn't want to touch it. For the logit functions, I tried `log(x) - log1p(-x)`...
@pytorchbot merge
@dagitses PTAL
There is still a caffe2 job in PyTorch trunk CI so I think it still has value.
@pytorchbot merge
@pytorchbot merge -f "Dynamo failure is pre-existing on master"