NeMo-Curator icon indicating copy to clipboard operation
NeMo-Curator copied to clipboard

[IMP] Decrease Merge Peak Memory Usage of ConnectedComponents

Open VibhuJawa opened this issue 1 year ago • 0 comments
trafficstars

Describe the bug

On smaller GPU skews we are running into memory issues in the broadcast merge in Connected Components. We have to decrease that memory footprint without hurting performance too much.

https://github.com/NVIDIA/NeMo-Curator/blob/f7441ea19df3200067545f630472fa937f285d86/nemo_curator/modules/fuzzy_dedup.py#L1605-L1655

CC: @ayushdg

VibhuJawa avatar Nov 15 '24 13:11 VibhuJawa