amazon-sagemaker-examples All metrics in statistics.json by Model Quality Monitor are "0.0 +/- 0.0", but confusion matrix is built correctly for multi-class classification!!

Link to the notebook Add the link to the notebook.

Describe the bug I have scheduled an hourly model-quality-monitoring job both the jobs, ground-truth-merge and model-quality-monitoring completes successfully without any errors. but, all the metrics calculated by the job are "0.0 +/- 0.0" while the confustion matrix gets calculated as expected.

To reproduce I have done everything as mentioned in this notebook for model-quality-monitoring from sagemaker-examples with very few changes and they are:

I have changed the model from xgboost churn to model trained on my data.
my input to the endpoint was csv like in the example-notebook, but output was json.
i have changed the problem-type from BinaryClassfication to MulticlassClassification wherever necessary.

Logs

Here's the statistics.json file that model-quality-monitor saved to S3 with confustion matrix built, but with 0s in all the metrics:

{
  "version" : 0.0,
  "dataset" : {
    "item_count" : 4432,
    "start_time" : "2022-02-23T03:00:00Z",
    "end_time" : "2022-02-23T04:00:00Z",
    "evaluation_time" : "2022-02-23T04:13:20.193Z"
  },
  "multiclass_classification_metrics" : {
    "confusion_matrix" : {
      "0" : {
        "0" : 709,
        "2" : 530,
        "1" : 247
      },
      "2" : {
        "0" : 718,
        "2" : 497,
        "1" : 265
      },
      "1" : {
        "0" : 700,
        "2" : 509,
        "1" : 257
      }
    },
    "accuracy" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_recall" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_precision" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f0_5" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f1" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "weighted_f2" : {
      "value" : 0.0,
      "standard_deviation" : 0.0
    },
    "accuracy_best_constant_classifier" : {
      "value" : 0.3352888086642599,
      "standard_deviation" : 0.003252410977346705
    },
    "weighted_recall_best_constant_classifier" : {
      "value" : 0.3352888086642599,
      "standard_deviation" : 0.003252410977346705
    },
    "weighted_precision_best_constant_classifier" : {
      "value" : 0.1124185852154987,
      "standard_deviation" : 0.0021869336610830254
    },
    "weighted_f0_5_best_constant_classifier" : {
      "value" : 0.12965524348784485,
      "standard_deviation" : 0.0024239410000317335
    },
    "weighted_f1_best_constant_classifier" : {
      "value" : 0.16838092925822584,
      "standard_deviation" : 0.0028615098045768348
    },
    "weighted_f2_best_constant_classifier" : {
      "value" : 0.24009212108475822,
      "standard_deviation" : 0.003326031863819311
    }
  }
}

Here's how couple of lines of captured data looks like(prettified for readability, but each line has no tab spaces as shown below) :

{
    "captureData": {
        "endpointInput": {
            "observedContentType": "text/csv",
            "mode": "INPUT",
            "data": "0,1,628,210,30",
            "encoding": "CSV"
        },
        "endpointOutput": {
            "observedContentType": "application/json",
            "mode": "OUTPUT",
            "data": "{\"label\":\"Transfer\",\"prediction\":2,\"probabilities\":[0.228256680901919,0.0,0.7717433190980809]}\n",
            "encoding": "JSON"
        }
    },
    "eventMetadata": {
        "eventId": "a7cfba60-39ee-4796-bd85-343dcadef024",
        "inferenceId": "5875",
        "inferenceTime": "2022-02-23T04:12:51Z"
    },
    "eventVersion": "0"
}
{
    "captureData": {
        "endpointInput": {
            "observedContentType": "text/csv",
            "mode": "INPUT",
            "data": "0,3,628,286,240",
            "encoding": "CSV"
        },
        "endpointOutput": {
            "observedContentType": "application/json",
            "mode": "OUTPUT",
            "data": "{\"label\":\"Adoption\",\"prediction\":0,\"probabilities\":[0.99,0.005,0.005]}\n",
            "encoding": "JSON"
        }
    },
    "eventMetadata": {
        "eventId": "7391ac1e-6d27-4f84-a9ad-9fbd6130498a",
        "inferenceId": "5876",
        "inferenceTime": "2022-02-23T04:12:51Z"
    },
    "eventVersion": "0"
}

Here's couple of lines from my ground-truths that I have uploaded to S3 look like(prettified for readability, but each line has no tab spaces as shown below):

{
  "groundTruthData": {
    "data": "0",
    "encoding": "CSV"
  },
  "eventMetadata": {
    "eventId": "1"
  },
  "eventVersion": "0"
}
{
  "groundTruthData": {
    "data": "1",
    "encoding": "CSV"
  },
  "eventMetadata": {
    "eventId": "2"
  },
  "eventVersion": "0"
},

Here's couple of lines(prettified for readability, but each line has no tab spaces as shown below) from the ground-truth-merged file look like(prettified for readability, but each line has no tab spaces as shown below). this file is created by the ground-truth-merge job, which is one of the two jobs that model-quality-monitoring schedule runs:

{
  "eventVersion": "0",
  "groundTruthData": {
    "data": "2",
    "encoding": "CSV"
  },
  "captureData": {
    "endpointInput": {
      "data": "1,2,1050,37,1095",
      "encoding": "CSV",
      "mode": "INPUT",
      "observedContentType": "text/csv"
    },
    "endpointOutput": {
      "data": "{\"label\":\"Return_to_owner\",\"prediction\":1,\"probabilities\":[0.14512373737373732,0.6597074314574313,0.1951688311688311]}\n",
      "encoding": "JSON",
      "mode": "OUTPUT",
      "observedContentType": "application/json"
    }
  },
  "eventMetadata": {
    "eventId": "c9e21f63-05f0-4dec-8f95-b8a1fa3483c1",
    "inferenceId": "4432",
    "inferenceTime": "2022-02-23T04:00:00Z"
  }
}
{
    "eventVersion": "0",
    "groundTruthData": {
        "data": "1",
        "encoding": "CSV"
    },
    "captureData": {
        "endpointInput": {
            "data": "0,2,628,5,90",
            "encoding": "CSV",
            "mode": "INPUT",
            "observedContentType": "text/csv"
        },
        "endpointOutput": {
            "data": "{\"label\":\"Adoption\",\"prediction\":0,\"probabilities\":[0.7029623691085284,0.0,0.29703763089147156]}\n",
            "encoding": "JSON",
            "mode": "OUTPUT",
            "observedContentType": "application/json"
        }
    },
    "eventMetadata": {
        "eventId": "5f1afc30-2ffd-42cf-8f4b-df97f1c86cb1",
        "inferenceId": "4433",
        "inferenceTime": "2022-02-23T04:00:01Z"
    }
}

Since, the confusion matrix was constructed properly, I presume that I fed the data to sagemaker-model-monitor the right-way. But, why are all the metrics 0.0, while confustion-matrix looks as expected?

Feb 23 '22 13:02 naveenmarthala

Hello,

We are facing a similar issue with binary classification. The confusion matrix is built correctly, but the metrics are completely wrong. In our case, not all metrics are set to 0, some of them have values, but they are completely miscalculated.

@naveen-marthala have you found the issue?

Berto

Mar 22 '22 09:03 bertocast

no @bertocast , i haven't been able to find the issue or a fix to this issue.

if it is possible at your side, please raise this issue with AWS Support(I can't afford to subscribe) and share the fix here for the community.

Mar 22 '22 10:03 naveenmarthala

amazon-sagemaker-examples amazon-sagemaker-examples copied to clipboard

All metrics in statistics.json by Model Quality Monitor are "0.0 +/- 0.0", but confusion matrix is built correctly for multi-class classification!!

amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard