scikit-learn-intelex [RF] Incorrect OOB calculation post-fit

Related: https://github.com/awslabs/autogluon/pull/1378

Describe the bug

A bug is present in scikit-learn-intelex<=2021.5 on the calculation of out-of-bag (OOB) predictions in random forest.

This bug only occurs if OOB predictions are computed post-fit, computing them during fit via oob_score=True works fine.

Computing OOB post-fit is much more desirable and is required in AutoGluon for efficient stack ensembling. For example, we may want to time the OOB calculation to estimate inference speed (and avoid OOB time being added to fit time), or we may not know if we need to compute OOB at training time, and only decide later to compute OOB if necessary.

Note: Computing OOB post-fit is not in the public API of sklearn, but is very useful. The colab notebook does this by replicating the inner logic used when oob_score=True.

To Reproduce

Colab Notebook: https://colab.research.google.com/drive/1-NG9KF30-JbDqwL3xw0FM6juneCVCFXX?usp=sharing

Dec 31 '21 03:12 Innixma

The issue has been reproduced, fix is in progress.

Jan 21 '22 14:01 lordoz234

Hi @Innixma I just picked this up and I'm trying to get to the bottom of the problem.

The OOB error calculation is happening in our C++ backend and we do not implement our own version of _set_oob_score_and_attributes(). So trying to use it won't work. The OOB samples in a call to this function overlap with the C++ backend training samples, which explains the overly optimistic log-loss.

There are multiple ways of achieving what you need, and making _set_oob_score_and_attributes() work as expected is just one of them.

In order to find the best solution I'm trying to understand your use case a bit better. Do you have an estimate of the performance hit that you get from calculating the OOB error during the fit? Both in stock scikit-learn and in scikit-learn-intelex. I can do my own benchmarks, but something closer to your setup would be helpful.

Are you aware of any discussions about making the post-fit OOB error calculation part of the official scikit-learn API?

Dec 28 '22 13:12 ahuber21

Just to elaborate on the performance question. A naive modification to your colab example shows negligible impact (see screenshot). So an example that demonstrates a significant performance hit would be much appreciated

Dec 30 '22 10:12 ahuber21

Starting in March 2022 I created a work-around for this:

        if self._daal:
            if params.get('warm_start', False):
                params['warm_start'] = False
            # FIXME: This is inefficient but sklearnex doesn't support computing oob_score after training
            params['oob_score'] = True

        model = model_cls(**params)

Therefore, if it is challenging to fix, it isn't a dealbreaker on usage.

Just to elaborate on the performance question. A naive modification to your colab example shows negligible impact (see screenshot). So an example that demonstrates a significant performance hit would be much appreciated

It is not about it being a performance hit. Rather, we can use post-fit OOB to estimate inference speed. If we do it at fit time, it is included in the training time and thus we wouldn't get an accurate inference speed unless we inferred a second time.

Are you aware of any discussions about making the post-fit OOB error calculation part of the official scikit-learn API?

I am not. I implemented this logic custom in AutoGluon. I would hope this would become supported in scikit-learn API at some point, but I lack bandwidth to prioritize personally adding it to scikit-learn, and the AutoGluon implementation works well for our needs.

Jan 04 '23 01:01 Innixma

Hi @Innixma thank you for the reply.

So do I get this right, you don't care about the actual OOB error, and you're only interested in the inference time? In this case, you can actually use the code from your colab example. The result is biased, because it mixes training and testing data. Nevertheless you are running inference on a deterministic subset of your training data. So the execution time is meaningful and you can compare it across different models.

Thinking about this made me wonder: Is there a reason that you're using the OOB error as a proxy for inference speed? I mean, other than it is achieved with relatively few lines of code. Just as an idea (and to avoid using undocumented scikit-learn APIs), you could predict the entire training dataset. This approach has the same bias problem, but for performance measurement it is fine. Please let me know if I'm missing something.

Given that you found a workaround in the meantime and that exposing the OOB error calculation to Python (or even the OOB indices for that matter) would require significant changes, we would prefer to close this issue without changes.

Some technical background if you're interested. When you're running .fit() on a Intel optimized RF classifier, it dispatches the calculation to the daal4py backend. In the Python world, only a model interface exists and since all calculations are happening in the daal4py libraries, nothing but the fit result is visible to the Python class instance. In particular, the OOB error calculation follows the scikit-learn example and is only performed on request and during the fit. While you can mimic the scikit-learn calculation by manually calling the appropriate helper functions, this will not work for the Intel optimized RF, as the internal state (i.e. the random state and, in particular, the OOB indices) are never visible to you. Internally, daal4py instantiates a number of things, including the C++ equivalent of the random forest class, which contains a helper that is storing the OOB indices. In order to access the OOB indices we would have to

persist the C++ random forest class (currently, it has the role of a local variable, created and destroyed in the scope of the .fit() function call)
expose the helper member
write a Python interface to retrieve the OOB indices from the helper

All of this is significant effort and I hope you agree with my conclusion that having an easier way to proxy the inference speed does not justify such changes.

Jan 12 '23 14:01 ahuber21

Hi @ahuber21,

Thanks for your detailed reply. This is completely reasonable, and as I mentioned that I had since found a work-around, I think you are ok with closing this issue.

Thinking about this made me wonder: Is there a reason that you're using the OOB error as a proxy for inference speed? I mean, other than it is achieved with relatively few lines of code.

Mainly to save compute, since OOB is required for things beyond inference speed in AutoGluon. It is a minor optimization. The other reason is because I would need to code custom logic within the context of AutoGluon to get the inference speed correct if not getting it from OOB. The correct way would be to use the training data as you mention. I think that isn't too hard to do on my end though.

Jan 12 '23 17:01 Innixma

scikit-learn-intelex scikit-learn-intelex copied to clipboard

[RF] Incorrect OOB calculation post-fit

scikit-learn-intelex
scikit-learn-intelex copied to clipboard