yocto-gl
yocto-gl copied to clipboard
[FR] Grading Notes
Willingness to contribute
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
Proposal Summary
Hello everyone.
This blog from Databricks introduces the idea of grading notes: https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes.
Basically the idea is that for many use cases writing full reference answers as a basis for comparison (for LLM-as-a-judge) is very time-consuming. Instead providing pointers to the judge on what to look out for in the form of short notes, while still manual, is still easier.
Since I can relate to this idea a lot and find the solution simple and yet very useful, I would like to work on this and make a PR.
Motivation
What is the use case for this feature?
This is used for LLM output evaluation.
Why is this use case valuable to support for MLflow users in general?
MLflow provides multiple evaluation possibilities and this idea extends it by allowing human preference to guide the LLM judge.
Why is this use case valuable to support for your project(s) or organization?
For code generation work that I have done, providing high-level guidelines to the LLM judge on what this task solution should look like is much more scalable than writing out fully working code as a reference.
Why is it currently difficult to achieve this use case?
It isn't "difficult" per se but providing support for it out of the box will 1) make it even simpler to handle 2) expose people to this idea who may not have discovered it otherwise.
Details
No response
What component(s) does this bug affect?
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations