Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

ML/Zoltan: ML / Zoltan and METIS requires 32bit ordinals but do not enforce it

Open jjellio opened this issue 2 years ago • 1 comments

Bug Report

@trilinos/ml @trilinos/zoltan

Description

I've been building METIS / ParMETIS through SPACK - and this allows you to choose the ordinal sizes. I talked with Karen Devine a while back about Zoltan1/2 in this context as well.

I think SNL's legacy packages (Zoltan and ML, maybe more) - have an implicit requirement that METIS have 32bit ordinals. But right now, there isn't a way to enforce this. Zoltan2 will handle this correctly (I believe) because it is templating on the METIS types.

This is sorta a bug, but also not. ML/Zoltan are probably 100% correct in their usage of METIS, but Trilinos probably needs some way to ensure that the ordinals are compatible. It seems like hitting this as a bug would require somehow having metis return some value larger than a 32bit int, then having this corrupted by ML (or Zoltan) only seeing the first 32bits of that. It would be a rather annoying error to trace, as it probably wouldn't cause a crash.

I've picked on ML, since I see a nice compiler warning, but I am pretty sure Zoltan is in the same boat.

Example Warning
[1263/5377] Building C object packages/ml/src/CMakeFiles/ml.dir/Coarsen/ml_agg_ParMETIS.c.o
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:672:13: warning: incompatible pointer types assigning to 'int *' from 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
    wgtflag = (indextype *) ML_allocate (4*sizeof(indextype));
            ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:738:8: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            wgtflag, &numflag, &ncon, &N_parts, tpwgts,
                            ^~~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:67:29: note: passing argument to parameter 'wgtflag' here
             idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts, 
                                   ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:738:17: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            wgtflag, &numflag, &ncon, &N_parts, tpwgts,
                                     ^~~~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:67:45: note: passing argument to parameter 'numflag' here
             idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts, 
                                                   ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:738:27: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            wgtflag, &numflag, &ncon, &N_parts, tpwgts,
                                               ^~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:67:61: note: passing argument to parameter 'ncon' here
             idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts, 
                                                                   ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:738:34: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            wgtflag, &numflag, &ncon, &N_parts, tpwgts,
                                                      ^~~~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:67:74: note: passing argument to parameter 'nparts' here
             idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts, 
                                                                                ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:738:44: warning: incompatible pointer types passing 'float *' to parameter of type 'real_t *' (aka 'double *') [-Wincompatible-pointer-types]
                            wgtflag, &numflag, &ncon, &N_parts, tpwgts,
                                                                ^~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:68:15: note: passing argument to parameter 'tpwgts' here
             real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part, 
                     ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:739:8: warning: incompatible pointer types passing 'float *' to parameter of type 'real_t *' (aka 'double *') [-Wincompatible-pointer-types]
                            &ubvec,options,
                            ^~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:68:31: note: passing argument to parameter 'ubvec' here
             real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part, 
                                     ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:739:15: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            &ubvec,options,
                                   ^~~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:68:45: note: passing argument to parameter 'options' here
             real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part, 
                                                   ^
/g/g20/jjellio/src/github/Trilinos-a/packages/ml/src/Coarsen/ml_agg_ParMETIS.c:740:8: warning: incompatible pointer types passing 'int *' to parameter of type 'idx_t *' (aka 'long *') [-Wincompatible-pointer-types]
                            &edgecut, part, &ParMETISComm);
                            ^~~~~~~~
/p/lustre1/jjellio/spack/install/cray-rhel8-zen3/cce-14.0.0/parmetis-4.0.3-jpgzminxnku2kjev6oh3u5wj5yh4wdmz/include/parmetis.h:68:61: note: passing argument to parameter 'edgecut' here
             real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part, 
                                                                   ^
9 warnings generated.

This arises when you build METIS with

  ^ [email protected] ~gdb~ipo+int64~shared %[email protected] arch=cray-rhel8-zen3 \
  ^ metis@5: ~gdb+int64+real64~shared %[email protected] arch=cray-rhel8-zen3 \

maybe an easy start would be to add something to FindPackage(Metis) that can express the ordinal size and print a CMake warning if ML or Zoltan is enabled, and the ordinal is not 32 bit.

I could be wrong about this ... but I've been fussing with this ordinal thing for a few months now. Maybe the answer is to build with 32bit ordinals, but I think SEACAS likes to use METIS with 64bit ones... So either way, figuring out what SNL wants for an ordinal size with these TPLs would be nice.

@ccober6

jjellio avatar Sep 12 '22 18:09 jjellio

As I recall, neither Zoltan nor Zoltan2 require 32-bit ordinals in ParMETIS. Doesn't Trilinos' automated testing use 64-bit builds of ParMETIS, thus providing a counter-example?

If Zoltan is built with both ParMETIS and Scotch, it does require that both ParMETIS and Scotch have the same ordinal size.

kddevin avatar Sep 26 '22 01:09 kddevin

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] avatar Sep 27 '23 12:09 github-actions[bot]

This issue was closed due to inactivity for 395 days.

github-actions[bot] avatar Oct 28 '23 12:10 github-actions[bot]