OpenROAD-flow-scripts
OpenROAD-flow-scripts copied to clipboard
Change AutoTuner failed metrics from "-" to ERROR_METRIC
Description
Ran into this while building out the RTL designer demo.
Tensorboard seems to get "confused" when the trials are a mix of successful and failed runs. The resulting behavior is that important metrics like effective_clk_period and die_size don't appear in the HPARAMs results. The previous workaround of touching all the files to change the order in which tensorboard reads the files didn't work for me. So, I ended up pruning out all the variant directories for the failed runs. Also, including tensorflow in the dependency list (suggested in https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/issues/3309) didn't work all the time either.
Based on this, I'm guessing that tensorboard gets confused about the data type for the metric, so it throws the metric away (even for the successful runs). Some of the trials have a number for the metric and some trials have a "-", so it doesn't know what to do with them.
Suggested Solution
My proposal is to change the AutoTuner evaluate() methods to return ERROR_METRIC instead of "-" for any metric. That way, the metric data type is the same, regardless of success or failure.
The downstream impact is that the HPARAMs Parallel Coordinates scales can be off, but this can be addressed by limiting the Max value for the metrics column.
Additional Context
No response