scikit-uplift
scikit-uplift copied to clipboard
Why is perfect uplift calculated differently for uplift curve and qini curve?
💡 Feature request
Hi! Perfect uplift is required to compute both perfect uplift curve and perfect qini curve. Why is the formula to generate the perfect uplift different? Does it make sense to unify the perfect uplift formula?
cr_num = np.sum((y_true == 1) & (treatment == 0)) # Control Responders
tn_num = np.sum((y_true == 0) & (treatment == 1)) # Treated Non-Responders
summand = y_true if cr_num > tn_num else treatment
perfect_uplift = 2 * (y_true == treatment) + summand
perfect qini curve
perfect_uplift = y_true * treatment - y_true * (1 - treatment)
Same question. I also don't understand the idea of counting perfect uplift in the perfect_uplift_curve, no descriptions anywhere
@steprandelli @Irek21 Thanks for your question!
Recall that in the classical uplift problem we are dealing with vectors, target is the value of the target variable and treatment is the value of influence (communication in marketing, treatment in medicine, etc.), which are binary.
Thus, we have only 4 different classes that we need to sort correctly ((1, 1), (0, 0), (0, 1), (1, 0)).
In order to understand what an ideal curve should look like, you need to understand in what order you need to arrange these 4 classes (pairs). Obviously, by moving observations inside each of the classes, the value of the curve will not change.
Let's call the ideal curve the curve with the maximum area under it. So, you need to understand how to rank 4 classes so that the area under the curve is maximal.
In the code, you can find an implementation of how these classes should be sorted. I hope someday we will add a section about metrics, in which there will be material about ideal curves.
If you describe the proofs of sorting these classes in more detail, we will be happy to add it to the user guide.
Many thanks to @kirrlix1994 for consultations on the metrics issues.