determined icon indicating copy to clipboard operation
determined copied to clipboard

feat: migrate existing profiler metrics to generic metrics [MD-300] [MD-301]

Open azhou-determined opened this issue 11 months ago • 0 comments

Description

This PR migrates existing profiling metrics (system metrics only) in trial_profiler_metrics to generic metrics metrics and changes existing APIs related to the profiler to shim old APIs to fit the new schema.

The data in trial_profiler_metrics and table isn't dropped at this time, in case we need to rollback. Dropping the table will be done after this feature lands.

Test Plan

You should have access to the database you're testing on to make sure the data migration ran successfully. This can be roughly assessed with an equal count of the unique trial IDs in the previous table and new table:

select count(distinct labels->>'trialId') from trial_profiler_metrics;
select count(distinct trial_id) from metrics where partition_type='PROFILING';

After data migration is successful, there should be old metrics in the new metrics table partition. Find any trial that has profiling metrics from the old table and go to the web UI's "profiler" tab. The "Throughput" and "Timing metrics" should be empty (this will be deprecated by web UI in a separate PR) but the "system metrics" should render metrics and dropdowns for metrics, agents, and GPUs:

Screenshot 2024-03-06 at 3 54 22 PM

Commentary (optional)

Checklist

  • [ ] Changes have been manually QA'd
  • [ ] User-facing API changes need the "User-facing API Change" label.
  • [ ] Release notes should be added as a separate file under docs/release-notes/. See Release Note for details.
  • [ ] Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

azhou-determined avatar Mar 06 '24 23:03 azhou-determined