NeMo-Curator
NeMo-Curator copied to clipboard
Refactor modules to be in a single class and dataframe backend checking
trafficstars
Description
This PR does a couple of things:
- Refactor
ScoreFilterandDocumentFilterto be a single module. - Refactor
ModifyandDocumentModifierto be a single module. - Add a generic
Moduleabstract base class that can validate the necessary backend of each module. - Add a
ToBackendmodule that converts between backends in a pipeline.
Usage
# Add snippet demonstrating usage
Checklist
- [ ] I am familiar with the Contributing Guide.
- [ ] New or Existing tests cover these changes.
- [ ] The documentation is up to date with these changes.