ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

[fix] Only drop rows where Nan is not in target column

Open ShreyaR opened this issue 3 years ago • 1 comments

This PR fixes the following bugs in preprocessing. Both bugs exist because we do joint preprocessing of training_set, val_set and test_set by concatenating then and passing them to build_dataset.

  • While removing Nans, we drop rows all rows from the test split because the have Nans in the target column.
  • After concatenating the 3 dataframes, we don't set the split percentage of the datasets based on their original sizes.

ShreyaR avatar Oct 20 '22 05:10 ShreyaR

Unit Test Results

         6 files  +    2           6 suites  +2   3h 42m 5s :stopwatch: + 1h 22m 45s   3 504 tests +  25    3 367 :heavy_check_mark:  -   35    79 :zzz: +  2  58 :x: +58  10 459 runs  +336  10 121 :heavy_check_mark: +214  244 :zzz: +28  94 :x: +94 

For more details on these failures, see this check.

Results for commit 4b697f0e. ± Comparison against base commit 578ba2ea.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 20 '22 07:10 github-actions[bot]