FBGEMM
FBGEMM copied to clipboard
Allow jagged_index_select to accept pre-computed output shape
Summary: jagged_index_select
's CPU kernel API already accepts num_dense_output_rows
as an argument. Generalize this to the CUDA kernel as well, which can to avoid a CPU-blocking .item()
call in the CUDA kernel if users decided to pre-compute it.
Differential Revision: D54085880