yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[FR] Bring your own evaluation prompt

Open alena-m opened this issue 1 year ago • 6 comments

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

Users should be able to override the grading_system_prompt_template from https://github.com/mlflow/mlflow/blob/master/mlflow/metrics/genai/prompts/v1.py. Because the current hardcoded Task doesn't suit the needs of all users.

Motivation

What is the use case for this feature?

  • Specific requirements for evaluation instructions.
  • Complex input/output with its own all sorts of delimiters

Why is this use case valuable to support for MLflow users in general?

The user should be able to customise the evaluation instructions or change the formatting/delimiters to better suit the particular use case.

Why is this use case valuable to support for your project(s) or organization?

Because we have

  • specific evaluation instructions
  • complex input with its own instructions which should be properly addressed in the main prompt instructions

Why is it currently difficult to achieve this use case?

Because the current evaluation prompt is hardcoded. The user can only change parts of it. But the main prompt doesn't fit for some use cases.

Details

No response

What component(s) does this bug affect?

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

alena-m avatar Jan 10 '24 13:01 alena-m

cc @prithvikannan @dbczumar

BenWilson2 avatar Jan 11 '24 00:01 BenWilson2

Hi @alena-m, thank you for raising this. We agree that it would be valuable, and we would really appreciate a contribution for it. I'll add the help wanted label for now. Let us know if you'd like to reconsider and take this on.

dbczumar avatar Jan 11 '24 01:01 dbczumar

cc @sunishsheth2009

dbczumar avatar Jan 11 '24 01:01 dbczumar

Agreed this would be super useful. The existing components for this are largely already present: make_metric() to create an arbitrary EvaluationMetric and the deployment client.predict() to call LLM. We would need to figure out some approach for users to define their own grading prompt and corresponding parsing logic.

prithvikannan avatar Jan 11 '24 02:01 prithvikannan

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

github-actions[bot] avatar Jan 18 '24 00:01 github-actions[bot]

I'd like to work on this issue.


My understanding is that make_genai_metric instantiates a EvaluationModel that has the grading_system_prompt_template hardcoded. So, the user could provide an optional grading_system_prompt_template argument that simply is used instead of the template in that case, that way the option is provided simply, and it doesn't break the existing approach!

Would that fit the requirements?

Cokral avatar Feb 09 '24 14:02 Cokral