MFC
MFC copied to clipboard
Fix benchmark divide by zero with proper error message
I'm not sure why it happens but we have an issue where we sometimes get a divide by zero in benchmark diff (e.g., https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285):
.=++*: -+*+=. | [email protected] [Linux]
:+ -*- == =* . | ----------------------------------------------------------
:*+ == ++ .+- | --jobs 1
:*##-.....:*+ .#%+++=--+=:::. | --mpi
-=-++-======#=--**+++==+*++=::-:. | --gpu
.:++=----------====+*= ==..:%..... | --no-debug
.:-=++++===--==+=-+= +. := | --targets pre_process, simulation, and post_process
+#=::::::::=%=. -+: =+ *: | ----------------------------------------------------------
.*=-=*=.. :=+*+: -...-- | $ ./mfc.sh (build, run, test, clean, count, packer) --help
Comparing Bencharks: master/bench-gpu.yaml is x times slower than pr/bench-gpu.yaml.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/main.py:65 in <module> │
│ │
│ 62 │ │ │
│ 63 │ │ __print_greeting() │
│ 64 │ │ __checks() │
│ ❱ 65 │ │ __run() │
│ 66 │ │
│ 67 │ except MFCException as exc: │
│ 68 │ │ cons.reset() │
│ │
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/main.py:50 in __run │
│ │
│ 47 │
│ 48 │
│ 49 def __run(): │
│ ❱ 50 │ {"test": test.test, "run": run.run, "build": │
│ 51 │ "clean": build.clean, "bench": bench.bench, "count": │
│ 52 │ "packer": packer.packer, "count_diff": count.count_diff, "bench_di │
│ 53 │ }[ARG("command")]() │
│ │
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/mfc/bench.py:119 in diff │
│ │
│ 116 │ │ │ if target.name not in lhs_summary or target.name not in rh │
│ 117 │ │ │ │ continue │
│ 1[18](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:19) │ │ │ │
│ ❱ 1[19](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:20) │ │ │ speedups[i] = f"{lhs_summary[target.name] / rhs_summary[ta │
│ 1[20](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:21) │ │ │
│ 1[21](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:22) │ │ table.add_row(f"[magenta]{slug}[/magenta]", *speedups) │
│ 1[22](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:23) │
╰──────────────────────────────────────────────────────────────────────────────╯
ZeroDivisionError: division by zero
ERROR: An unexpected exception occurred: division by zero
./mfc.sh: line 49: [23](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:24)9801 Terminated python3 "$(pwd)/toolchain/main.py" "$@"
Part of the fix is a proper Python exception if either the lhs_summary[target.name] or rhs_summary[target.name is zero.
Is there a reason why these are integer values and not float ?
@anandrdbz out of convenience, I suppose. Please see this issue for a possible fix https://github.com/MFlowCode/MFC/issues/393
@anandrdbz After watching the last PR fail a few times here https://github.com/MFlowCode/MFC/actions/runs/8638692906/job/23683596528?pr=285
I'm not really sure why there's a divide-by-zero problem or what is happening. It seems like one of the tests failed (either PR or master), but it isn't reporting that. @henryleberre any idea what's going on? Could look into the logs for this as well...
Update: In that PR i think it's because something in the PR is causing all of the cases to output 0 (likely the code @anandrdbz put in the .mako file). I suspect this is the problem whenever we see a divide by zero error.. a case either didn't run or there's a bug in printing its length.
Fixed by #423