ludwig
ludwig copied to clipboard
[fix] Only drop rows where Nan is not in target column
This PR fixes the following bugs in preprocessing. Both bugs exist because we do joint preprocessing of training_set, val_set and test_set by concatenating then and passing them to build_dataset.
- While removing
Nans, we drop rows all rows from the test split because the haveNans in the target column. - After concatenating the 3 dataframes, we don't set the split percentage of the datasets based on their original sizes.
Unit Test Results
6 files + 2 6 suites +2 3h 42m 5s :stopwatch: + 1h 22m 45s 3 504 tests + 25 3 367 :heavy_check_mark: - 35 79 :zzz: + 2 58 :x: +58 10 459 runs +336 10 121 :heavy_check_mark: +214 244 :zzz: +28 94 :x: +94
For more details on these failures, see this check.
Results for commit 4b697f0e. ± Comparison against base commit 578ba2ea.
:recycle: This comment has been updated with latest results.