bhack

Results 1417 comments of bhack

Is this going to impact also boolean masking ops right? E.g `mytensor[mask]`

I could be interested but: - As this is at c++/cuda level do we have a nightly cache to speedup the c++ build? if not I suppose it will take...

In case anyone is interested in the upstream ticket https://github.com/NVIDIA/cccl/issues/47

@ezyang Is this enough when merged/released : https://github.com/NVIDIA/cccl/issues/1422

@ezyang I don't know the ETA for the upstream merge but what do you think about the @elstehle temp wrapper at https://github.com/NVIDIA/cccl/pull/1612 ?

I think it could be still risky to temp adapt a solution like that wrapper: > We want to have more elaborate test coverage by adding tests for large number...

This is going to happen also in inductor emitted code. E.g. ```python aten.index_put_(buf4, [reinterpret_tensor(buf0, (1, 1, s2*s3, 14 + s2, 14 + s3), (0, 0, 196 + (14*s2) + (14*s3)...

Also the just released segment-anything v2 https://github.com/facebookresearch/segment-anything-2/issues/44

> fyi, the `cub::DeviceSelect` algorithm will be updated to support more than `INT_MAX` input elements when this PR is merged (nearly completed): [NVIDIA/cccl#2400](https://github.com/NVIDIA/cccl/pull/2400) Thank you for the update. Do you...

See https://github.com/pytorch/pytorch/issues/125981#issuecomment-2110734485