dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

fix(llmobs): deprecate and convert numerical metrics to score type

Open lievan opened this issue 1 year ago • 3 comments

LLM Obs backend currently does not support ingesting the numerical metric type, so the SDK needs to be updated to

  1. warn users not to submit this metric type and also
  2. submit any numerical metric types as a supported score metric type for users who already started submitting evaluation metrics with the numerical type.

So we still support users using submit_evaluation with the 'numerical' type, under the hood it will just be converted to score type.

Checklist

  • [x] Change(s) are motivated and described in the PR description
  • [x] Testing strategy is described if automated tests are not included in the PR
  • [x] Risks are described (performance impact, potential for breakage, maintainability)
  • [x] Change is maintainable (easy to change, telemetry, documentation)
  • [x] Library release note guidelines are followed or label changelog/no-changelog is set
  • [x] Documentation is included (in-code, generated user docs, public corp docs)
  • [x] Backport labels are set (if applicable)
  • [x] If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

  • [x] Title is accurate
  • [x] All changes are related to the pull request's stated goal
  • [x] Description motivates each change
  • [x] Avoids breaking API changes
  • [x] Testing strategy adequately addresses listed risks
  • [x] Change is maintainable (easy to change, telemetry, documentation)
  • [x] Release note makes sense to a user of the library
  • [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • [x] Backport labels are set in a manner that is consistent with the release branch maintenance policy

lievan avatar Jun 26 '24 21:06 lievan

Datadog Report

Branch report: evan.li/remove-numeric-supp Commit report: a30aacb Test service: dd-trace-py

:white_check_mark: 0 Failed, 774 Passed, 39698 Skipped, 33m 56.92s Total duration (41m 56.42s time saved)

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 27.06%. Comparing base (deadfcd) to head (70c625d). Report is 41 commits behind head on main.

Files Patch % Lines
ddtrace/llmobs/_llmobs.py 0.00% 4 Missing :warning:
tests/llmobs/test_llmobs_service.py 0.00% 3 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #9658       +/-   ##
===========================================
- Coverage   75.61%   27.06%   -48.55%     
===========================================
  Files        1336     1365       +29     
  Lines      125991   127491     +1500     
===========================================
- Hits        95271    34511    -60760     
- Misses      30720    92980    +62260     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 26 '24 21:06 codecov-commenter

Benchmarks

Benchmark execution time: 2024-07-05 15:04:00

Comparing candidate commit f34d219a87660d23cf752933e6c990a78595f3b3 in PR branch evan.li/remove-numeric-supp with baseline commit f9edeed4205d6ab854aea3d8d23b0cba26f7714d in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 221 metrics, 9 unstable metrics.

pr-commenter[bot] avatar Jun 27 '24 16:06 pr-commenter[bot]