MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

Add support for nan values in PandasCodec

Open Pappol opened this issue 1 year ago • 7 comments

PandasCodec does not support null values well at all, the method can_encode is completely misleading by just checking if it is a dataframe.

Pappol avatar May 07 '24 12:05 Pappol

Thanks for raising this @Pappol

json serialization is wrong as well:

import pandas as pd
from mlserver.codecs.pandas import PandasCodec

df = pd.DataFrame({'foo': [None, 1.0]})
PandasCodec.encode_request(df).json()

serialized request:

{
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
            "name": "foo",
            "shape": [
                2,
                1
            ],
            "datatype": "FP64",
            "data": [
                NaN,
                1.0
            ]
        }
    ]
}

btw, In case anyone else is facing the same issue, this is my quick-n-dirty way to handle it:

import pandas as pd
from mlserver.types import InferenceRequest

def replace_nan_with_none(inference_request: InferenceRequest) -> InferenceRequest:
    for i, _input in enumerate(inference_request.inputs):
        for ii, v in enumerate(_input.data.__root__):
            if pd.isna(v):
                inference_request.inputs[i].data.__root__[ii] = None
    return inference_request

sp1thas avatar Jun 13 '24 22:06 sp1thas

Main issue is with dates data types

Pappol avatar Jun 14 '24 04:06 Pappol

Hi @sp1thas -- Thanks for bringing this up and for showing your workaround. I will assign this to myself and have a look at what exactly is causing this.

@Pappol -- Do you have a reproducible example of the behaviour you are experiencing?

ramonpzg avatar Jul 24 '24 11:07 ramonpzg

Hey @ramonpzg , I've also took a look in the meanwhile and I've opened #1893 . Could you review it? Looking forward for your input.

sp1thas avatar Aug 28 '24 13:08 sp1thas

The serialization issue with np.nan is tackled since 1.4.0 and https://github.com/SeldonIO/MLServer/pull/1346 .

sp1thas avatar Sep 05 '24 18:09 sp1thas

The change only fixes the Problem with nan values. As soon you have any text data with None value it breaks: This problem occurs as soon as there is an attempt to pass any non-numeric values. For instance, I have a df that contains text values, some of which can be None. A fix would be to check for float values in mlserver/codecs/numpy.py row 109: if isinstance(val, float) and np.isnan(val)

bwallima avatar Oct 14 '24 12:10 bwallima

@bwallima many thanks for the comment, we welcome contributions to mlserver. Feel free to raise a PR with this change.

sakoush avatar Oct 23 '24 07:10 sakoush