bhack

Results 1417 comments of bhack

Is there a way to use the batch in the torchub/torchload example?

@ZachL1 Is thar torchub code handling the camera intrinsics differently from the HF demo code?

This issue is still reproducible in code-insider. Any update?

If you are interested we have a few threads also at: https://github.com/pytorch/pytorch/issues/130150 (selective_scan custom ops) https://github.com/pytorch/pytorch/issues/95408#issuecomment-2543348770 (native selective_scan and associative_scan) https://github.com/pytorch/pytorch/issues/120189 (mamba native).

/cc @ezyang @eqy This is an explorative blackbox PR as I don't have free cuda resources right now and we don't have a quick way to setup the env to...

> Should we add a (presumably large tensor) test for this? Do we had an `INT_MAX` test already somewhere that we could expand?

As we don't have a specific CUDA test do we want to find a workaround from python? Can you suggest one from `grep -R torch.nonzero test/`?

I think I am going to close this as `cub::DispatchSelectIf` probably it will be slower then `cub::DeviceSelect::Flagged` we are currently using. Probably we need to wait upstream for https://github.com/NVIDIA/cccl/issues/1422 What...

@ezyang Do you think we can we open a new ticket to lower this with Trition `where` and `sum`? https://github.com/pytorch/pytorch/blob/a174c536f8f32b41da9efa647e364196423468a5/torch/_inductor/lowering.py#L2187C20-L2187C35 Edit: The ticket is at https://github.com/pytorch/pytorch/issues/126003

Ok thanks, so I am going to close it as I don't have the env and currently spare GPU computing to write a brand new test and recompile it. At...