cobra icon indicating copy to clipboard operation
cobra copied to clipboard

Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift)

Open sandervh14 opened this issue 3 years ago • 4 comments

====> Do not merge yet! See below for explanation. <=====

Story Title

See issue #126: Tuning classification model on different metrics (lift, recall, precision,...)

Changes made

  • Adapted forward feature selection class:
    • it can now accept metric, metric_args and metric_kwargs and higher_is_better as argument.
    • Extensive documentation of what to fill in in those parameters.
    • compute_model_performances: passing metric, metric_args and metric_kwargs to model evaluate()
    • _find_next_best_model(): idem + adapted logic for selecting new model as next best, according to higher_is_better logic.
  • Adapted both LogisticRegressionModel and LinearRegressionModel to support a custom metric, metric_args and metric_kwargs in evaluate().

I tried out my changes in a notebook on a model, looks good. :-) But do not merge yet! I'm stuck on the 25 failing unit tests, mainly due to the cryptical error "mocker not found":

test setup failed
file C:\Users\sander.vandenhautte\PycharmProjects\cobra\tests\model_building\test_forward_selection.py, line 55
      @pytest.mark.parametrize("model_type", ["classification", "regression"])
      def test_compute_model_performances(self, mocker, model_type):
E       fixture 'mocker' not found
>       available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

=> Are we using fixtures? With a find, I can't locate where mocker is instantiated.

Anyway, next to this question, of course fixing the unit tests remain, before we can merge this into develop. But meanwhile Hendrik can use the issue branch, and I can discuss with you Sam how to fix the above unit test shizzle. :-) I'll plan in that work for next week.

How does the solution address the problem

This PR lets any data scientist the freedom to pick a different evaluation metric for forward feature selection or just for simple evaluation of a model.

Linked issues

Resolves #126

sandervh14 avatar Apr 08 '22 08:04 sandervh14

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

sborms avatar Apr 13 '22 13:04 sborms

Wrt the mocking, you need to install pytest-mock (pip install pytest-mock).

Thanks, I've added that info to the Contributing guidelins & workflow for in case other will have the same question as I did.

sandervh14 avatar May 25 '22 12:05 sandervh14

Hey Sam, just FYI: I'm committing my code changes I did based on your review, but then some meetings came up in between because I'm going to start at a new client, I couldn't finish the unit tests, will do that after the long weekend.

sandervh14 avatar May 25 '22 16:05 sandervh14

Hi Sam.

I've finished changes for this one. I started fixing the failing unit tests, but during the work I started noticing how much code of LogisticRegressionModel and LinearRegressionModel actually copied eachother, so I've also abstracted the duplicate code out of those classes into a base class Model.

So maybe have a look at it again if you have the time, before merging. Or actually you also just merge, in my opinion it's quite fine, and all unit tests + the new unit tests run successfully.

Optional read: TLDR: Writing unit tests for code that accept a function (the "metric") that is only specified "at runtime" (i.e. in the unit test scope, in the case of the unit tests), proved to be hard to "just like that" monkey patch with pytest-mock, at least with my knowledge of it. To make my problem clear, and the solution I made, I clarify with the below, longer (sorry) explanation.

  1. How does the below test function achieve patching the function roc_auc_score by stating that "cobra.model_building.models.roc_auc_score" can be patched, whereas actually in models.py (the file which is unit tested), the roc_auc_score function is actually imported (as from sklearn.metrics import roc_auc_score) instead of being implemented in that file? I would have expected the path to be specified for patching then as "sklearn.metrics.roc_auc_score" instead of "cobra.model_building.models.roc_auc_score" then wrong? Or is this just pytest-mock magic, stating that any function created in that module scope - either by definition in that library, or by importing - must be monkey patched?

(I've added pytest-mock to my development plan to learn this magic!)

from sklearn.metrics import top_k_accuracy_score

def test_evaluate_no_metric_specified(self, mocker):
    X = mock_data()
    y = pd.Series([1] * 5 + [0] * 5)

    def mock_roc_auc_score(y_true, y_score):
        return 0.79

    (mocker
     .patch("cobra.model_building.LogisticRegressionModel.score_model",
            mock_score_model_classification))

    (mocker
     .patch("cobra.model_building.models.roc_auc_score",
            mock_roc_auc_score))

    model = LogisticRegressionModel()
    actual = model.evaluate(X, y)  # implied: metric=None

    assert actual == 0.79
  1. My problem in writing the unit tests was that the following didn't work:
def mock_top_k_accuracy_score(y_true, y_score,
                              *, k=2, normalize=True,
                              sample_weight=None, labels=None):
   # ...
    return 0.14

(mocker
 .patch("sklearn.metrics.top_k_accuracy_score",
        mock_top_k_accuracy_score))

model = LogisticRegressionModel()
actual = model.evaluate(X, y,
                        metric=top_k_accuracy_score)

... which is why I went for passing a custom scoring function directly, rather than mocking one from sklearn:

       def top_k_accuracy_score(y_true, y_score,
                                 *, k=2, normalize=True,
                                 sample_weight=None, labels=None):
            if not np.array_equal(y_true, y):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_true "
                                 "argument.")
            if not np.array_equal(y_score,
                                  mock_score_model_classification_output):
                raise ValueError("LogisticRegressionModel.evaluate() did not "
                                 "succeed in passing the correct y_score "
                                 "argument.")

            return 0.14

        model = LogisticRegressionModel()
        actual = model.evaluate(X, y,
                                metric=top_k_accuracy_score)

So, I sort of circumvented the mocking difficulty, but I think in the end the custom function tests the functionality well anyway. Do let me know though if you think it needs to be done in another way, though.

Finally: YAAY, a new pull request that is finished!

sandervh14 avatar Jun 01 '22 12:06 sandervh14