Vibhu Jawa
Vibhu Jawa
Keeping the PR around for benchmarking purposes so moved this to draft
@copilot , Please fix these ruff issues. ``` Error: nemo_curator/stages/base.py:37:1: W293 Blank line contains whitespace Error: nemo_curator/stages/base.py:40:1: W293 Blank line contains whitespace Error: nemo_curator/stages/base.py:45:26: EM102 Exception must not use an...
/ok to test [14ca3d9](https://github.com/NVIDIA-NeMo/Curator/pull/1038/commits/14ca3d945ec426fe1da2b864098bf09a38a42d83)
@copilot , Please fix these errors: ``` FAILED tests/stages/text/modules/test_filters.py::TestHeuristicFilters::test_pornographicurls - AssertionError: Expected DocumentBatch(task_id='batch_1_PornographicUrlsFilter', dataset_name='test_1', data= text 0 no url 1 fine url https://www.nvidia.com/en-us/, _stage_perf=[], _metadata={}, _uuid='a0838bf8-6254-4bd2-8c2f-a7f3a7e39e10') but got DocumentBatch(task_id='batch_1_document_filter', dataset_name='test_1',...
@copilot , please address reviews
@copilot , please fix below ruff issues: ``` Error: nemo_curator/stages/base.py:51:1: W293 Blank line contains whitespace Error: nemo_curator/stages/base.py:54:1: W293 Blank line contains whitespace Error: The process '/opt/hostedtoolcache/ruff/0.11.4/x86_64/ruff' failed with exit code...
/okay to test [41ccb36](https://github.com/NVIDIA-NeMo/Curator/pull/1038/commits/41ccb3654324541b1a89f61b1c27acbc3a05fc54)
@copilot , Please fix : ``` =================================== FAILURES =================================== ______________________ TestPreviewStage.test_with_method _______________________ self = def test_with_method(self): """Test the with_ method for creating modified instances.""" stage = PreviewStage() # Test modifying...
/okay to test [d3e4dbd](https://github.com/NVIDIA-NeMo/Curator/pull/1038/commits/d3e4dbd1ba0a9a2444f8def33b4a8f4daf68bcd0)
/ok to test [0cd716b](https://github.com/NVIDIA-NeMo/Curator/pull/1038/commits/0cd716bb43d01267470ff48cf5128ceca2e570fc)