bhack comments

Results 1417 comments of


                                            bhack

extend `nonzero` to int64

Just to check if it could compile at least with the current CUB version. Do you know what is this CI failure? ```cuda /usr/local/cuda/include/cub/agent/agent_select_if.cuh(264): error: function "at::native::::NonZeroOp::operator() [with T=c10::complex]" cannot...

extend `nonzero` to int64

I think we need `cub/cub/agent/agent_select_if.cuh` changes introduced at https://github.com/NVIDIA/cccl/pull/1379 So this mean that we need to wait for the next cuda 12.4 update and make it also conditional.

extend `nonzero` to int64

@ezyang the new [CUDA 12.5](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) delivers `CUB` `2.4.0` so it could be enough for this workaround.

extend `nonzero` to int64

> if you're willing to wait for cuda 12.5 :) This version is required for the workaround API. A full upstream solution it will require to wait more CUDA releases.

extend `nonzero` to int64

The current status for testing/using the workaround in this PR is to have `CUB 2.4.0`. As in pytorch we currently use CUB from the official CUDA distribution it means we...

extend `nonzero` to int64

@ezyang See https://github.com/NVIDIA/cccl/issues/1454#issuecomment-2138705800

extend `nonzero` to int64

We are waiting pytorch adopting CUDA 12.5

extend `nonzero` to int64

@ezyang In case you're interested, it appears that META's release of SAMv2 yesterday might be affected by this issue: https://github.com/facebookresearch/segment-anything-2/issues/44.

extend `nonzero` to int64

the bot does not forgive...

`aten.index_put_` runtime shape mismatch

Also, on A100/L4 where it is working without runtime errors we have a different runtime error with other input sizes: ```python aten.index_put_(buf4, [reinterpret_tensor(buf0, (1, 1, s2*s3, 14 + s2, 14...