sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Support different input and output types for Model Monitor data quality monitoring jobs

Open caitriggs opened this issue 2 years ago • 1 comments

Discussed in https://github.com/aws/sagemaker-python-sdk/discussions/2393

Currently, when capturing data for a scheduled Model Monitor job, the data inputs and outputs must be encoded using the same content type. Otherwise, the following error occurs:

Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.

An example for what captureData is giving back for a ModelMonitor.list_executions() call where the input is CSV and the output is set to JSON: {"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0.5629733055182888,0.3018707225866159,0.5824503894753207","encoding":"CSV"},"endpointOutput":{"observedContentType":"application/json","mode":"OUTPUT","data":"{\"predictions\": [{\"score\": 0.012620825320482254, \"predicted_label\": 0}]}","encoding":"JSON"}},"eventMetadata":{"eventId":"28cc8646-bb47-4a96-92fc-d04fc2651286","inferenceTime":"2021-11-30T04:32:05Z"},"eventVersion":"0"}

Please support different input and output content types for Model Monitor data quality monitoring jobs.

Because there's also no apparent way to set both input and output of the endpoint to the same encoding in the SageMaker examples or documentation.

Setting a serializer and deserializer to the same content type during deployment of the endpoint does not appear to work. The endpoint continues to only set endpointOutput to JSON.

# data capture config object
data_capture_config = sagemaker.model_monitor.DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100, 
    capture_options=["REQUEST", "RESPONSE"],
    csv_content_types=["text/csv"],
    destination_s3_uri=s3_capture_upload_path,
    sagemaker_session=sm_sess
)

model.deploy(
             initial_instance_count=endpoint_instance_count,
             instance_type=endpoint_instance_type,
             model_name=model_name,
             endpoint_name=endpoint_name,
             data_capture_config=data_capture_config,
             serializer=sagemaker.serializers.CSVSerializer(),
             deserializer=sagemaker.deserializers.CSVDeserializer(accept="application/json"),
             tags=[{'Key':'demo-configs', 'Value':prefix}]
)

This results in any scheduled data quality monitoring job fail with that same "Encoding mismatch" error.

caitriggs avatar Dec 04 '21 00:12 caitriggs

you got the resolution for this @caitriggs ?

saivinilpratap-ta avatar Sep 15 '22 17:09 saivinilpratap-ta