MFC icon indicating copy to clipboard operation
MFC copied to clipboard

Fix benchmark divide by zero with proper error message

Open sbryngelson opened this issue 1 year ago • 3 comments

I'm not sure why it happens but we have an issue where we sometimes get a divide by zero in benchmark diff (e.g., https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285):

.=++*:          -+*+=.        | [email protected] [Linux]
     :+   -*-        ==   =* .      | ----------------------------------------------------------
   :*+      ==      ++    .+-       | --jobs 1
  :*##-.....:*+   .#%+++=--+=:::.   | --mpi
  -=-++-======#=--**+++==+*++=::-:. | --gpu
 .:++=----------====+*= ==..:%..... | --no-debug
  .:-=++++===--==+=-+=   +.  :=     | --targets pre_process, simulation, and post_process
  +#=::::::::=%=. -+:    =+   *:    | ----------------------------------------------------------
 .*=-=*=..    :=+*+:      -...--    | $ ./mfc.sh (build, run, test, clean, count, packer) --help

 Comparing Bencharks: master/bench-gpu.yaml is x times slower than pr/bench-gpu.yaml.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/main.py:65 in <module>                                     │
│                                                                              │
│   62 │   │                                                                   │
│   63 │   │   __print_greeting()                                              │
│   64 │   │   __checks()                                                      │
│ ❱ 65 │   │   __run()                                                         │
│   66 │                                                                       │
│   67 │   except MFCException as exc:                                         │
│   68 │   │   cons.reset()                                                    │
│                                                                              │
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/main.py:50 in __run                                        │
│                                                                              │
│   47                                                                         │
│   48                                                                         │
│   49 def __run():                                                            │
│ ❱ 50 │   {"test":   test.test,     "run":        run.run,          "build":  │
│   51 │    "clean":  build.clean,   "bench":      bench.bench,      "count":  │
│   52 │    "packer": packer.packer, "count_diff": count.count_diff, "bench_di │
│   53 │   }[ARG("command")]()                                                 │
│                                                                              │
│ /storage/coda1/p-sbryngelson3/0/sbryngelson3/runners/actions-runner-4/_work/ │
│ MFC/MFC/toolchain/mfc/bench.py:119 in diff                                   │
│                                                                              │
│   116 │   │   │   if target.name not in lhs_summary or target.name not in rh │
│   117 │   │   │   │   continue                                               │
│   1[18](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:19) │   │   │                                                              │
│ ❱ 1[19](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:20) │   │   │   speedups[i] = f"{lhs_summary[target.name] / rhs_summary[ta │
│   1[20](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:21) │   │                                                                  │
│   1[21](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:22) │   │   table.add_row(f"[magenta]{slug}[/magenta]", *speedups)         │
│   1[22](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:23)                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
ZeroDivisionError: division by zero
 

ERROR: An unexpected exception occurred: division by zero

./mfc.sh: line 49: [23](https://github.com/MFlowCode/MFC/actions/runs/8610942830/job/23597260867?pr=285#step:5:24)9801 Terminated              python3 "$(pwd)/toolchain/main.py" "$@"

Part of the fix is a proper Python exception if either the lhs_summary[target.name] or rhs_summary[target.name is zero.

sbryngelson avatar Apr 10 '24 02:04 sbryngelson

Is there a reason why these are integer values and not float ?

anandrdbz avatar Apr 10 '24 21:04 anandrdbz

@anandrdbz out of convenience, I suppose. Please see this issue for a possible fix https://github.com/MFlowCode/MFC/issues/393

sbryngelson avatar Apr 10 '24 21:04 sbryngelson

@anandrdbz After watching the last PR fail a few times here https://github.com/MFlowCode/MFC/actions/runs/8638692906/job/23683596528?pr=285

I'm not really sure why there's a divide-by-zero problem or what is happening. It seems like one of the tests failed (either PR or master), but it isn't reporting that. @henryleberre any idea what's going on? Could look into the logs for this as well...

Update: In that PR i think it's because something in the PR is causing all of the cases to output 0 (likely the code @anandrdbz put in the .mako file). I suspect this is the problem whenever we see a divide by zero error.. a case either didn't run or there's a bug in printing its length.

sbryngelson avatar Apr 11 '24 00:04 sbryngelson

Fixed by #423

sbryngelson avatar May 24 '24 15:05 sbryngelson