NeMo-Curator icon indicating copy to clipboard operation
NeMo-Curator copied to clipboard

SemDedup bug fix for single element cluster

Open praateekmahajan opened this issue 6 months ago • 0 comments
trafficstars

Description

Without this that one single cluster will have datatype of int32 vs float32 for other columns and hence all of semdedup_pruning_tables won't be read in case some one tries to read it combined

Usage

# Add snippet demonstrating usage

Checklist

  • [ ] I am familiar with the Contributing Guide.
  • [ ] New or Existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

praateekmahajan avatar Apr 22 '25 21:04 praateekmahajan