[Feature / Docs] "Deploying with MLflow" section of the docs are somewhat misleading
What feature would you like to see?
Description
It is recommended in the deployment docs for DSPy to use MLFlow to package and serve a model. It is noted that specifying the task "llm/v1/chat" when logging to mlflow, will deploy the program and allow it to take "input and generate output in the same format as the OpenAI chat API".
However, there are some significant limitations that are not stated when using this deployment option:
The first being that the mlflow-model-serving, specification dictates that inference can only be executed over the invocations endpoint. Most applications that provide compatibility with the OpenAI chat API use some form of the OpenAI client with the option to have the base_url changed to the location of the compatible endpoint. As far as I'm aware, there is no way to change the OpenAI client to target the invocations endpoint, or mlflow-model-serving to expose a chat/completions endpoint. Given this, one would need to deploy an additional server endpoint to proxy requests from chat/completions to invocations if they want to be compatible with DSPy models deployed on MLFlow.
The second notable limitation is that the output format is not really the same as the OpenAI chat API. DSPy models deployed via MLFlow will only return a choices array consisting of the ChatCompletions choices. The OpenAI Python Client expects some additional fields here and given that they are not provided, this should cause output parsing to fail.
Given the above I think it would be not unreasonable to say that the compatibility with the OpenAI Chat api is far weaker than implied.
Potential Solution
MLFlow provides AI Gateway as a feature to proxy requests to different supported backends and since mlflow-model-serving is a supported backend provider, this could technically be used to solve both of the problems above to create an interface that is closer in format to the OpenAI Chat api.
To support this there will need to be some changes to the MLFlow DSPy wrapper to get the output format correct so that AI Gateway can parse it but once that is done the above issues are resolved without the need for a custom chat/completions to invocations proxy. If this is done, the docs should be updated to reflect the deployment of AI Gateway
Please let me know if I'm missing something in regards to my understanding of how DSPy and MLFlow interact for deployments.
Would you like to contribute?
- [x] Yes, I'd like to help implement this.
- [ ] No, I just want to request it.
Additional Context
No response
@TomeHirata Mind taking a look?
Thank you for the report. The MLflow AI gateway is designed for providing proxy for LLM providers such as Anthropic or OpenAI rather than MLflow model serving. I don't think the usage of invocation endpoint is a problem as full path should be configurable with client SDKs such as LiteLLM (see https://docs.litellm.ai/docs/providers/custom). I agree that the format of DSPy chat model being a list of choices is misleading. Let me change either the doc or DSPy flavor implemantation.