training_policies icon indicating copy to clipboard operation
training_policies copied to clipboard

Performance measurement for short run times and DVFS

Open TheKanter opened this issue 5 years ago • 2 comments

  • Need run-time >5min in order to get good estimate of steady state perf otherwise power draw can exceed TDP for short periods. So not really a fair measurement.

  • E.g. scaling across many nodes can have short run-times => stay in the “cold DVFS” regime (related to temperature of heat sink or cooling solution)

  • Problem is that at-scale runtime measurement is no longer representative (assuming goal is steady state measurement).

  • Can we mitigate this through rules for how to run these systems?

  • Do we all agree with this assumption?

  • Mitigation

  • Testing protocol which requires warm-up period which exercises hw and burns power, followed by measurement period

  • Require X warm-up runs to ensure that total run-time exceeds minimum length

  • Or turn off dynamic clock (DK believes this is a bad bad idea!)

  • Chip vendors would need to provide guidance about this protocol

  • Minimum run time will vary based on cooling solution (e.g., liquid vs. air-cooled)

  • Biggest downside is increased complexity for groups which perform the submission runs (and it’s already complex)

  • Complex how it would interact with caching policy (which requires cold start) in the case of “do X runs first”

DK: Discussion with expert confirms that 5 minute warm up period would work for air-cooled system. Must find details for liquid cooled systems.

AIs:

DK to talk to liquid cooling people Vendors to talk to internal power management experts, please ask about liquid cooled in particular

TheKanter avatar Nov 22 '19 17:11 TheKanter

One idea is to start power measurement at the 8 or 16 chip scale -- roughly one "box". All benchmarks appear to run >5mins at this scale. We can look at other options for larger scales in the future.

bitfort avatar Apr 09 '20 16:04 bitfort

Backlog since no power in v0.7

petermattson avatar Jun 10 '20 22:06 petermattson