determined
determined copied to clipboard
feat: migrate existing profiler metrics to generic metrics [MD-300] [MD-301]
Description
This PR migrates existing profiling metrics (system metrics only) in trial_profiler_metrics
to generic metrics metrics
and changes existing APIs related to the profiler to shim old APIs to fit the new schema.
The data in trial_profiler_metrics
and table isn't dropped at this time, in case we need to rollback. Dropping the table will be done after this feature lands.
Test Plan
You should have access to the database you're testing on to make sure the data migration ran successfully. This can be roughly assessed with an equal count of the unique trial IDs in the previous table and new table:
select count(distinct labels->>'trialId') from trial_profiler_metrics;
select count(distinct trial_id) from metrics where partition_type='PROFILING';
After data migration is successful, there should be old metrics in the new metrics
table partition. Find any trial that has profiling metrics from the old table and go to the web UI's "profiler" tab. The "Throughput" and "Timing metrics" should be empty (this will be deprecated by web UI in a separate PR) but the "system metrics" should render metrics and dropdowns for metrics, agents, and GPUs:
Commentary (optional)
Checklist
- [ ] Changes have been manually QA'd
- [ ] User-facing API changes need the "User-facing API Change" label.
- [ ] Release notes should be added as a separate file under
docs/release-notes/
. See Release Note for details. - [ ] Licenses should be included for new code which was copied and/or modified from any external code.