recsys-examples icon indicating copy to clipboard operation
recsys-examples copied to clipboard

[Enhancement] Memory waste in segmented_unique

Open z52527 opened this issue 1 month ago • 0 comments

Problem

The segmented_unique function allocates table_num separate buffers for tmp_unique_indices and tmp_accumulated_frequency_output, each of size num_total, but processes tables serially and only uses a small portion of each buffer. This wastes significant GPU memory.

Location

File: /corelib/dynamicemb/src/index_calculation.cu

std::vector<at::Tensor> tmp_unique_indices(table_num);
for (int i = 0; i < table_num; ++i) {
    tmp_unique_indices[i] = at::empty_like(keys);  // Each buffer size: num_total
}

Todo

Use a single shared buffer of size num_total or directly write to the final output buffer using slices with offsets.


By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.

z52527 avatar Nov 26 '25 07:11 z52527