NeMo-Curator icon indicating copy to clipboard operation
NeMo-Curator copied to clipboard

Refactor modules to be in a single class and dataframe backend checking

Open ryantwolf opened this issue 1 year ago • 0 comments
trafficstars

Description

This PR does a couple of things:

  • Refactor ScoreFilter and DocumentFilter to be a single module.
  • Refactor Modify and DocumentModifier to be a single module.
  • Add a generic Module abstract base class that can validate the necessary backend of each module.
  • Add a ToBackend module that converts between backends in a pipeline.

Usage

# Add snippet demonstrating usage

Checklist

  • [ ] I am familiar with the Contributing Guide.
  • [ ] New or Existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

ryantwolf avatar Nov 18 '24 22:11 ryantwolf