NeMo-Curator
NeMo-Curator copied to clipboard
SemDedup bug fix for single element cluster
trafficstars
Description
Without this that one single cluster will have datatype of int32 vs float32 for other columns and hence all of semdedup_pruning_tables won't be read in case some one tries to read it combined
Usage
# Add snippet demonstrating usage
Checklist
- [ ] I am familiar with the Contributing Guide.
- [ ] New or Existing tests cover these changes.
- [ ] The documentation is up to date with these changes.