bhack
bhack
Just to check if it could compile at least with the current CUB version. Do you know what is this CI failure? ```cuda /usr/local/cuda/include/cub/agent/agent_select_if.cuh(264): error: function "at::native::::NonZeroOp::operator() [with T=c10::complex]" cannot...
I think we need `cub/cub/agent/agent_select_if.cuh` changes introduced at https://github.com/NVIDIA/cccl/pull/1379 So this mean that we need to wait for the next cuda 12.4 update and make it also conditional.
@ezyang the new [CUDA 12.5](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) delivers `CUB` `2.4.0` so it could be enough for this workaround.
> if you're willing to wait for cuda 12.5 :) This version is required for the workaround API. A full upstream solution it will require to wait more CUDA releases.
The current status for testing/using the workaround in this PR is to have `CUB 2.4.0`. As in pytorch we currently use CUB from the official CUDA distribution it means we...
@ezyang See https://github.com/NVIDIA/cccl/issues/1454#issuecomment-2138705800
We are waiting pytorch adopting CUDA 12.5
@ezyang In case you're interested, it appears that META's release of SAMv2 yesterday might be affected by this issue: https://github.com/facebookresearch/segment-anything-2/issues/44.
the bot does not forgive...
Also, on A100/L4 where it is working without runtime errors we have a different runtime error with other input sizes: ```python aten.index_put_(buf4, [reinterpret_tensor(buf0, (1, 1, s2*s3, 14 + s2, 14...