METIS
METIS copied to clipboard
Large 2000x peformance regression from 5.1 to 5.2
Using METIS 5.1:
pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
• Combine mesh parts (0.02s)
• Construct graph (0.00s)
• Partition graph (0.01s)
• Renumber vertices (0.03s)
• Repartition mesh (0.01s)
• Write mesh (0.01s)
where the partitioning and renumbering (both of which make calls to METIS_PartGraphRecursive
) complete almost immediately. By contrast using METIS 5.2.1:
pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
• Combine mesh parts (0.01s)
• Construct graph (0.00s)
• Partition graph (17.33s)
• Renumber vertices (7.22s)
• Repartition mesh (0.01s)
• Write mesh (0.01s)
where we can see a huge slow down (on the order of ~2000x) for the partition graph portion which makes a single call to METIS_PartGraphRecursive
. The inputs are identical in both cases, also reproduced with METIS_PartGraphKway
. Also reproduced on both Linux (x86-64) and macOS (AARCH64).
This occurs with all of our grids/meshes. Profiling 5.2.1 with perf record
we find:
17.05% pyfr libmetis.so.0 [.] libmetis__FM_Mc2WayCutRefine
8.88% pyfr libmetis.so.0 [.] libmetis__CreateCoarseGraph
8.03% pyfr libmetis.so.0 [.] libmetis__FM_2WayCutRefine
7.93% pyfr libmetis.so.0 [.] libmetis__rpqInsert
5.24% pyfr libmetis.so.0 [.] libmetis__rpqUpdate
5.12% pyfr libc.so.6 [.] random
4.21% pyfr libmetis.so.0 [.] libmetis__Compute2WayPartitionParams
4.21% pyfr libmetis.so.0 [.] libmetis__rpqGetTop
4.21% pyfr libmetis.so.0 [.] libmetis__Match_SHEM
4.01% pyfr libmetis.so.0 [.] libmetis__SelectQueue
2.79% pyfr libmetis.so.0 [.] libmetis__iset
2.77% pyfr libmetis.so.0 [.] libmetis__Project2WayPartition
2.20% pyfr libmetis.so.0 [.] libmetis__Match_RM
1.99% pyfr libmetis.so.0 [.] libmetis__ComputeLoadImbalanceDiffVe
c
1.93% pyfr libmetis.so.0 [.] libmetis__McGeneral2WayBalance
1.85% pyfr libmetis.so.0 [.] libmetis__iaxpy
1.34% pyfr libmetis.so.0 [.] libmetis__rpqDelete
1.22% pyfr libmetis.so.0 [.] libmetis__BucketSortKeysInc
whereas with 5.1 (good) we find:
10.18% pyfr libopenblas64_p-r0-15028c96.3.21.so [.] blas_thread_server
9.73% pyfr [unknown] [k] 0xffffffff900001a2
9.30% pyfr libc.so.6 [.] __sched_yield
8.22% pyfr libpython3.11.so.1.0 [.] _PyEval_EvalFrameDefault
1.00% pyfr libpython3.11.so.1.0 [.] 0x0000000000192fb0
0.96% pyfr libpython3.11.so.1.0 [.] 0x00000000001949c0
0.80% pyfr libmetis.so.0 [.] libmetis__FM_Mc2WayCutRefine
0.59% pyfr libpython3.11.so.1.0 [.] _PyType_Lookup
0.57% pyfr libmetis.so.0 [.] libmetis__rpqInsert
where METIS is just a rounding error in the runtime.
Can you share the graphs in Metis format to reproduce this locally?
So I sat down and bisected the git revisions and found the culprit was:
https://github.com/KarypisLab/METIS/commit/5ba158081c1373adf6de94d1e7a257f31490b9d3
which causes ABI breakage. Without recompilation, any METIS 5.1 application will pass an incorrect options array with 5.2 due to every option past METIS_OPTION_DBGLVL being shifted down by one.
I'll put together a PR later which gives these enum options explicit values so such breakage can be avoided in the future as/when new options are added.
This reordering of options broke using METIS 5.2.1 from MUMPS for me. They have code like
MUMPS_INT ncon, edgecut, options[40];
ierr=METIS_SetDefaultOptions(options);
options[0] = 0;
/* Use 1-based fortran numbering */
options[17] = 1;
ncon = 1;
ierr = METIS_PartGraphKway(n, &ncon, iptr, jcn,
NULL, NULL, NULL,
k, NULL, NULL, options,
&edgecut, part);
and I got a lot of complaints from Metis about the graph to the log, and then some crash.
Of course, it's not good that the Mumps people assumed that METIS_OPTION_NUMBERING
will always be 17
(they even include metis.h), but it seems that it could have been easily avoided in the metis side, too (or could be fixed in 5.2.2).
If I change
@@ -271,12 +271,10 @@ typedef enum {
METIS_OPTION_IPTYPE,
METIS_OPTION_RTYPE,
METIS_OPTION_DBGLVL,
- METIS_OPTION_NIPARTS,
METIS_OPTION_NITER,
METIS_OPTION_NCUTS,
METIS_OPTION_SEED,
METIS_OPTION_NO2HOP,
- METIS_OPTION_ONDISK,
METIS_OPTION_MINCONN,
METIS_OPTION_CONTIG,
METIS_OPTION_COMPRESS,
@@ -285,6 +283,8 @@ typedef enum {
METIS_OPTION_NSEPS,
METIS_OPTION_UFACTOR,
METIS_OPTION_NUMBERING,
+ METIS_OPTION_NIPARTS,
+ METIS_OPTION_ONDISK,
METIS_OPTION_DROPEDGES,
/* Used for command-line parameter purposes */
Mumps works fine again. (I would have complained there if they had a public issue tracker :))
I am maintaining the conda-forge build of METIS, so when the dust settles here let me know and I can bump the version and/or add a patch.
So I sat down and bisected the git revisions and found the culprit was:
which causes ABI breakage. Without recompilation, any METIS 5.1 application will pass an incorrect options array with 5.2 due to every option past METIS_OPTION_DBGLVL being shifted down by one.
Just for the sake of completeness, that commit is also included in METIS 5.1.1, so even an application built with METIS 5.1.0 will already gave wrong results when used at runtime with METIS 5.1.1 .
For reference, this is moptions_et
in METIS 5.1.0 :
/*! Options codes (i.e., options[]) */
typedef enum {
METIS_OPTION_PTYPE,
METIS_OPTION_OBJTYPE,
METIS_OPTION_CTYPE,
METIS_OPTION_IPTYPE,
METIS_OPTION_RTYPE,
METIS_OPTION_DBGLVL,
METIS_OPTION_NITER,
METIS_OPTION_NCUTS,
METIS_OPTION_SEED,
METIS_OPTION_NO2HOP,
METIS_OPTION_MINCONN,
METIS_OPTION_CONTIG,
METIS_OPTION_COMPRESS,
METIS_OPTION_CCORDER,
METIS_OPTION_PFACTOR,
METIS_OPTION_NSEPS,
METIS_OPTION_UFACTOR,
METIS_OPTION_NUMBERING,
/* Used for command-line parameter purposes */
METIS_OPTION_HELP,
METIS_OPTION_TPWGTS,
METIS_OPTION_NCOMMON,
METIS_OPTION_NOOUTPUT,
METIS_OPTION_BALANCE,
METIS_OPTION_GTYPE,
METIS_OPTION_UBVEC
} moptions_et;
and this is in METIS 5.1.1 and 5.2.1 :
/*! Options codes (i.e., options[]) */
typedef enum {
METIS_OPTION_PTYPE,
METIS_OPTION_OBJTYPE,
METIS_OPTION_CTYPE,
METIS_OPTION_IPTYPE,
METIS_OPTION_RTYPE,
METIS_OPTION_DBGLVL,
METIS_OPTION_NIPARTS,
METIS_OPTION_NITER,
METIS_OPTION_NCUTS,
METIS_OPTION_SEED,
METIS_OPTION_NO2HOP,
METIS_OPTION_ONDISK,
METIS_OPTION_MINCONN,
METIS_OPTION_CONTIG,
METIS_OPTION_COMPRESS,
METIS_OPTION_CCORDER,
METIS_OPTION_PFACTOR,
METIS_OPTION_NSEPS,
METIS_OPTION_UFACTOR,
METIS_OPTION_NUMBERING,
METIS_OPTION_DROPEDGES,
/* Used for command-line parameter purposes */
METIS_OPTION_HELP,
METIS_OPTION_TPWGTS,
METIS_OPTION_NCOMMON,
METIS_OPTION_NOOUTPUT,
METIS_OPTION_BALANCE,
METIS_OPTION_GTYPE,
METIS_OPTION_UBVEC
} moptions_et;
the diff is:
--- 5.1.0
+++ 5.1.1
@@ -6,10 +6,12 @@
METIS_OPTION_IPTYPE,
METIS_OPTION_RTYPE,
METIS_OPTION_DBGLVL,
+ METIS_OPTION_NIPARTS,
METIS_OPTION_NITER,
METIS_OPTION_NCUTS,
METIS_OPTION_SEED,
METIS_OPTION_NO2HOP,
+ METIS_OPTION_ONDISK,
METIS_OPTION_MINCONN,
METIS_OPTION_CONTIG,
METIS_OPTION_COMPRESS,
@@ -18,6 +20,7 @@
METIS_OPTION_NSEPS,
METIS_OPTION_UFACTOR,
METIS_OPTION_NUMBERING,
+ METIS_OPTION_DROPEDGES,
/* Used for command-line parameter purposes */
METIS_OPTION_HELP,
For anyone interested, a patch that make mumps 5.2.1 work with metis 5.1.1 and 5.2.1 (but breaking compatibility with metis 5.1.0) that seems to work is available at https://github.com/conda-forge/mumps-feedstock/blob/c524cb3c71686bee59d9b12df5d9d6ce20782ce4/recipe/mumps_support_only_metis_5_1_1.patch .