bhack
bhack
Is there a way to use the batch in the torchub/torchload example?
@ZachL1 Is thar torchub code handling the camera intrinsics differently from the HF demo code?
This issue is still reproducible in code-insider. Any update?
If you are interested we have a few threads also at: https://github.com/pytorch/pytorch/issues/130150 (selective_scan custom ops) https://github.com/pytorch/pytorch/issues/95408#issuecomment-2543348770 (native selective_scan and associative_scan) https://github.com/pytorch/pytorch/issues/120189 (mamba native).
/cc @ezyang @eqy This is an explorative blackbox PR as I don't have free cuda resources right now and we don't have a quick way to setup the env to...
> Should we add a (presumably large tensor) test for this? Do we had an `INT_MAX` test already somewhere that we could expand?
As we don't have a specific CUDA test do we want to find a workaround from python? Can you suggest one from `grep -R torch.nonzero test/`?
I think I am going to close this as `cub::DispatchSelectIf` probably it will be slower then `cub::DeviceSelect::Flagged` we are currently using. Probably we need to wait upstream for https://github.com/NVIDIA/cccl/issues/1422 What...
@ezyang Do you think we can we open a new ticket to lower this with Trition `where` and `sum`? https://github.com/pytorch/pytorch/blob/a174c536f8f32b41da9efa647e364196423468a5/torch/_inductor/lowering.py#L2187C20-L2187C35 Edit: The ticket is at https://github.com/pytorch/pytorch/issues/126003
Ok thanks, so I am going to close it as I don't have the env and currently spare GPU computing to write a brand new test and recompile it. At...