yocto-gl [FR] evaluate metrics inside a prompt engineering UI run

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

I just want to know if we can have a way to perform metrics-based evaluation in a prompt engineering UI run.

I've seen from the documentation here that we can do it by using mlflow.evaluate() on a data obtained from mlflow.load_table() from a specific prompt engineering UI run.

I think it is a good idea if we can choose metrics directly when creating a new prompt engineering UI run and later to evaluate them.

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

No response

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Jan 29 '24 15:01 farrelmahendra

This sounds reasonable to me. cc @daniellok-db Currently we don't evaluate anything but combine model inputs, outputs and some parameters into the table. But to support evaluation, we might need more inputs than just the metrics, ground_truth field for example.

Jan 30 '24 06:01 serena-ruan

cc @prithvikannan @hubertzub-db

Jan 30 '24 06:01 serena-ruan

This sounds reasonable to me. cc @daniellok-db Currently we don't evaluate anything but combine model inputs, outputs and some parameters into the table. But to support evaluation, we might need more inputs than just the metrics, ground_truth field for example.

Yes basically it is the mlflow.evaluate inside the prompt engineering UI specific for LLM evaluation

Jan 30 '24 11:01 farrelmahendra

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Feb 06 '24 00:02 github-actions[bot]

yocto-gl yocto-gl copied to clipboard

[FR] evaluate metrics inside a prompt engineering UI run

Willingness to contribute

Proposal Summary

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

yocto-gl
yocto-gl copied to clipboard