MLServer
MLServer copied to clipboard
Passing dataframe index for pandas requests
When trying to infer on mlserver I get the error below, I think mlserver is failing to convert the InferenceRequest to mlflow compatible input.
mlflow.exceptions.MlflowException: Expected input to be DataFrame or list. Found: InferenceRequest
Hey @BFAnas ,
My guess would be that the payload is missing the right content type. This is used by MLServer to convert the request to the right Python type (e.g. dataframe, tensor, dict of tensors, etc.).
Could you try adding a "parameters": {"content_type": "pd"} field at the top level of your payload? This will convert the request to a Dataframe (which seems to be what your model requires?). Alternatively, you can also try to set the right MLflow model signature, so that MLServer can infer the conten type from there.
@adriangonz Thank you for the swift response. I'm trying to implement it, but I run to a problem that I'm not able to solve. I want to pass a dataframe with datetime index but I don't see a clear way of passing an index, I tried to pass it as a column but it didn't work. I'm still working on it but if you know how to do it please let me know. Once I try your solution about the "content type" I'll return with the feedback.
Hey @BFAnas ,
Similarly to how you can ask MLServer to treat the whole request as a multi-column dataframe, you can also specify that some of your columns are of particular types. In your case, alongside the pd content type at the request top-level, you could also add a "parameters": {"content_type": "datetime"} field for the inputs which have to be transformed into a datetime.
Regarding the index, currently we just build the dataframe as pd.DataFrame(data), which I guess just defaults to using the default row number index.
Hi @adriangonz Thanks for your comment. Yes I use datetime for content_type, but it's the index that I need to pass along with the columns, because I want to do some preprocessing of timeseries inside the model. I guess I can create a new column with the index and then in the model, in the preprocessing set again the index based on that column. I was hoping there's a cleaner way to do it.
Right, got it.
I think this could be a useful feature. Would you mind changing the issue title to something along the lines of "Selecting column as index for pandas requests"? We've already have a few other things prioritised though, i.e. any help from the community is always welcome!
In the meantime, as you say, the best workaround is probably to add some custom logic which handles the indexing.
How about "Passing dataframe index for pandas requests" ? Because ideally we want to be able to pass a dataframe with it's index. BTW it is possible to do it in mlflow, with dataframe.to_json(orient="split") it takes into account the index as well, and if it is a datetime index it'll convert it automatically to int.
Sure thing @BFAnas , that title sounds great to me!
MLServer speaks the V2 Inference Protocol, so I expect the solution will be along the lines of passing the index as another entry of the inputs array, and just flag it as the "index" through a new field in the parameters object of that particular input entry.