iree Log config and runtime details of each benchmark in Shark Tank

For each benchmark run, please include in the results:

details about tuning config, compiler flags used, etc.
details about runtime parameters e.g. number of threads used
any useful output, traces, and log files.

Aug 25 '22 04:08 mariecwhite

@monorimet @dan-garvey

Aug 25 '22 21:08 powderluv

WIP

Sep 08 '22 22:09 erob710

Hey there, a few questions as I'm implementing this.

What kind of threading is the IREE team interested in having reported in benchmark results? PyTorch and TF have API for fetching the number of inter- and intra-op parallelism threads used, as well as separate API for other parallelized processes such as input preprocessing.
I am planning on having pytest --benchmark generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.

Sep 15 '22 17:09 monorimet

What kind of threading is the IREE team interested in having reported in benchmark results? PyTorch and TF have API for fetching the number of inter- and intra-op parallelism threads used, as well as separate API for other parallelized processes such as input preprocessing.

Key here is for us to make sure the number of runtime threads used is the same for IREE and baseline (PyTorch and TF). I know at the moment the default configs are being used so any info about number of threads on all runtimes would be helpful, especially for debugging. At the moment IREE has the --task_topology_group_count runtime param and --iree-codegen-llvm-number-of-threads compiler param so if we find that the baseline config is using a different config, we can adjust accordingly and have a more apples-to-apples comparison.

I am planning on having pytest --benchmark generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.

Full repro steps is the goal, including mlir files, input files, model files, etc. for debugging. Would it be possible to also save the tuning configs (or an id to the tuning config used so that we can look it up after the fact)?

Sep 15 '22 22:09 mariecwhite

I suggest for TF we use our n2-highcpu-64 icelake instances. It has two numa nodes of 16 cores (no HT) and the Intel TF version pins TF to one numa node. We can set IREE to also use the same number of 16 threads on one numa node with numactl

Sep 15 '22 23:09 powderluv

Close

Sep 29 '22 22:09 erob710

Stale issue, closing.

Jul 10 '24 15:07 ScottTodd