determined icon indicating copy to clipboard operation
determined copied to clipboard

feat: implement profiler in Core API [MD-10] [MD-302]

Open azhou-determined opened this issue 11 months ago • 0 comments

Description

Implement the system metric profiling functionality in Core API.

This is a complete rewrite of the old ProfilerAgent. Timing metrics functionality was removed and system metrics are now being reported to the generic metrics backend.

Test Plan

As this PR only contains the harness/python changes, testing should be done manually and requires direct access to the database. After testing each of the entrypoints for profiler, query the database and make sure the new system metrics data is present.

select * from metrics where trial_id=TRIALID and partition_type='PROFILING';

Core API

import time

import logging

import determined as det
from determined import core


def main(core_context):
    core_context.profiler.on()
    for batch in range(100):
        steps_completed = batch + 1
        if steps_completed % 5 == 0:
            core_context.train.report_training_metrics(
                steps_completed=steps_completed, metrics={"x": batch}
            )
        if steps_completed % 10 == 0:
            core_context.train.report_validation_metrics(steps_completed=steps_completed, metrics={"x": batch})
        time.sleep(1)
    core_context.profiler.off()


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG, format=det.LOG_FORMAT)

    with core.init() as core_context:
        main(core_context=core_context)

Trainer API (PyTorch)

Run the MNIST example in examples/tutorials/mnist_pytorch with trainer.fit(...profiling_enabled=True)

TFKeras (harness)

Submit a TFKeras experiment with profiling configs in the experiment config:

profiling:
    enabled: true

Commentary (optional)

Checklist

  • [ ] Changes have been manually QA'd
  • [ ] User-facing API changes need the "User-facing API Change" label.
  • [ ] Release notes should be added as a separate file under docs/release-notes/. See Release Note for details.
  • [ ] Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

azhou-determined avatar Mar 06 '24 19:03 azhou-determined