NeMo-Curator
NeMo-Curator copied to clipboard
[IMP] Decrease Merge Peak Memory Usage of ConnectedComponents
trafficstars
Describe the bug
On smaller GPU skews we are running into memory issues in the broadcast merge in Connected Components. We have to decrease that memory footprint without hurting performance too much.
https://github.com/NVIDIA/NeMo-Curator/blob/f7441ea19df3200067545f630472fa937f285d86/nemo_curator/modules/fuzzy_dedup.py#L1605-L1655
CC: @ayushdg