[Pipeline] Underlying SQL Metrics

Open zacandcheese opened this issue 2 years ago • 1 comments

Description:

There is currently no way to generate the SQL to make a metric table.

Tasks:

[ ] machine_learning/metrics/classification.py: Create a way to get the underlying SQL of the metrics
[x] machine_learning/metrics/regression.py: Adding an additional parameter to regression_report to return the SQL of the metric instead of the result of the metrics.

Definition of Done:

SQL code generation is possible for regression and classification.

Concerns:

An example to show we really don't use sql to compute classification anymore:

how accuracy_score used to be computed in _metrics.py the 0.12.0 version of Verticapy AVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)
how accuracy_score is computed now in classification.py in 1.0.0

def accuracy_score(...):
    return _compute_final_score(
        _accuracy_score,
        **locals(),
    )

def _accuracy_score(...):
    return (tp + tn) / (tp + tn + fn + fp)

def confusion_matrix(...) -> np.ndarray:
        res = _executeSQL(
            query=f"""
            SELECT 
                CONFUSION_MATRIX(obs, response 
                USING PARAMETERS num_classes = 2) OVER() 
            FROM 
                (SELECT 
                    DECODE({y_true}, '{pos_label}', 
                           1, NULL, NULL, 0) AS obs, 
                    DECODE({y_score}, '{pos_label}', 
                           1, NULL, NULL, 0) AS response 
                 FROM {input_relation}) VERTICAPY_SUBTABLE;""",
            title="Computing Confusion matrix.",
            method="fetchall",
        )
        return np.round(np.array([x[1:-1] for x in res])).astype(int)

def _compute_final_score(...):
    cm = confusion_matrix(y_true, y_score, input_relation, **kwargs)
    return _compute_final_score_from_cm(metric, cm, average=average, multi=multi

Jan 03 '24 15:01 zacandcheese

@zacandcheese did you find any solution for this one?

Mar 10 '24 08:03 oualib