auto-sklearn
auto-sklearn copied to clipboard
Add multi-class AUC ROC.
Codecov Report
Merging #1581 (b867c48) into development (013d7ee) will decrease coverage by
0.37%. The diff coverage is100.00%.
@@ Coverage Diff @@
## development #1581 +/- ##
===============================================
- Coverage 84.93% 84.55% -0.38%
===============================================
Files 155 155
Lines 11898 11899 +1
Branches 2058 2058
===============================================
- Hits 10105 10061 -44
- Misses 1246 1274 +28
- Partials 547 564 +17
Hi @deadly-panda,
I ran the automated tests and it all seems to pass except one test. Just a small change for the meta-data. You can view them by clicking on the details section on any of the failing tests.
You can run it locally with:
pytest test/test_scripts/test_metadata_generation.py
Hi @eddiebergman .. Thank you, if you can also check the discussion on the issue. Indeed, I had the same questions about the numerical values of the tests & some black issues in the sub-module. I will check the failing test.
We can keep the discussion for the PR here
For the unit tests, am not sure which numbers I should use ? (I created a PR, u can't take a look at the numbers I used).
I'm not too sure either, I guess some specific examples you can come up with are good enough, maybe @mfeurer can check specifically as I'm unfamiliar with the metric. I check autogluon tests and they don't test it either, while sklearn give no specific examples either.
Also some tests were failling, for example TestMetric:test_classification_multiclass and I had to add the roc_auc metric to the list of skipped metrics, without really understanding why. can you give more details about this?
The failing test for meta-learning was discussed above.
We skip metrics where the metric does not make sense, i.e. multilabel metrics don't make sense when doing binary classification. Therefore, we skip them. The structure of this test file is a bit of a mess so don't worry about it not being clear.
As for the test (and others where you skip), I checked the tests where you skip it and they make sense except for one, def test_classification_multiclass. Shouldn't this be one of the tests where it's explicitly used as its a multiclass metric?
When I run black & mypy, there was some issues (from old code) with the common sub-module, should I also push those changes in the same PR.?
Ignore, it's okay :)