yocto-gl
yocto-gl copied to clipboard
[FR] Bring your own evaluation prompt
Willingness to contribute
No. I cannot contribute this feature at this time.
Proposal Summary
Users should be able to override the grading_system_prompt_template
from https://github.com/mlflow/mlflow/blob/master/mlflow/metrics/genai/prompts/v1.py. Because the current hardcoded Task
doesn't suit the needs of all users.
Motivation
What is the use case for this feature?
- Specific requirements for evaluation instructions.
- Complex input/output with its own all sorts of delimiters
Why is this use case valuable to support for MLflow users in general?
The user should be able to customise the evaluation instructions or change the formatting/delimiters to better suit the particular use case.
Why is this use case valuable to support for your project(s) or organization?
Because we have
- specific evaluation instructions
- complex input with its own instructions which should be properly addressed in the main prompt instructions
Why is it currently difficult to achieve this use case?
Because the current evaluation prompt is hardcoded. The user can only change parts of it. But the main prompt doesn't fit for some use cases.
Details
No response
What component(s) does this bug affect?
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
cc @prithvikannan @dbczumar
Hi @alena-m, thank you for raising this. We agree that it would be valuable, and we would really appreciate a contribution for it. I'll add the help wanted
label for now. Let us know if you'd like to reconsider and take this on.
cc @sunishsheth2009
Agreed this would be super useful. The existing components for this are largely already present: make_metric()
to create an arbitrary EvaluationMetric and the deployment client.predict()
to call LLM. We would need to figure out some approach for users to define their own grading prompt and corresponding parsing logic.
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
I'd like to work on this issue.
My understanding is that make_genai_metric
instantiates a EvaluationModel
that has the grading_system_prompt_template
hardcoded.
So, the user could provide an optional grading_system_prompt_template
argument that simply is used instead of the template in that case, that way the option is provided simply, and it doesn't break the existing approach!
Would that fit the requirements?