mpich icon indicating copy to clipboard operation
mpich copied to clipboard

ROMIO: check error class instead of error string

Open wkliao opened this issue 5 years ago • 19 comments

MPI_File_set_view() returns error class MPI_ERR_ARG when disp argument is negative

Pull Request Description

Expected Impact

Author Checklist

  • [ ] Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • [ ] Remove xfail from the test suite when fixing a test
  • [ ] Commits are self-contained and do not do two things at once
  • [ ] Commit message is of the form: module: short description and follows good practice
  • [ ] Passes whitespace checkers
  • [ ] Passes warning tests
  • [ ] Passes all tests
  • [ ] Add comments such that someone without knowledge of the code could understand
  • [ ] You or your company has a signed contributor's agreement on file with Argonne
  • [ ] For non-Argonne authors, request an explicit comment from your companies PR approval manager

wkliao avatar Jun 29 '20 08:06 wkliao

@wkliao What are the differences between src/mpi/romio/test and test/mpi/io? Could we merge or synchronize them?

hzhou avatar Jul 03 '20 15:07 hzhou

I notice test/mpi/io was available in the release of mpich2-1.0. To me, test programs in test/mpi/io can be moved to src/mpi/romio/test.

wkliao avatar Jul 04 '20 01:07 wkliao

To me, test programs in test/mpi/io can be moved to src/mpi/romio/test.

If we do that, then our test suite have to treat io tests differently from the rest of the tests, breaking every of our current testing setup. So that is hardly acceptable.

It is also very undesirable to maintain two sets of tests (as in current situation), one in the main test suite, and the other in src/mpi/romio/test.

What I would prefer is to have them synchronized. In the main repository, we only maintain the tests in test/mpi/io (and also in various sub tests such as test/mpi/f77/io, etc.). Can we simply let ROMIO autogen.sh copy the tests to src/mpi/romio/test? That way we have the flexibility of running test via main test suite or separately in romio. It should also work with standalone ROMIO tarball (if we make one).

What do you think?

hzhou avatar Jul 04 '20 19:07 hzhou

@wkliao Are the set of PRs you just created all parts of split from the original "standalone" PR? Could you refresh me how the standalone ROMIO supposed to work? Is it like -- user have a separately installed MPI implementation that is without MPI-IO, and then build and install ROMIO completely independently? How would mpi.h work? Will it always contain the declarations for MPI-IO? How would user program work? Do they always need link -lromio explicitly?

hzhou avatar Jul 04 '20 19:07 hzhou

I suggest the other way around. MPICH's autogen.sh copies the test programs from ROMIO into MPICH. Here is my argument. ROMIO can become a stand alone module to be used by other MPI implementations, such as OpenMPI. It is better to make the test programs available to other implementations.

All my recent PRs are breakdown of #4008 and I have not finished it yet.

Stand-alone ROMIO means it can be built separately from MPICH using CC=mpicc for example, where mpicc can be MPICH, OpenMPI, or Cray MPI that was built with its own MPI-IO implementation. Linking the stand-alone ROMIO will overwrite the MPI-IO subroutines of these MPI libraries. There is no ROMIO header file. All MPI user programs still include just mpi.h and all they need to do is to add -lromio at the end of link command line.

wkliao avatar Jul 05 '20 03:07 wkliao

I suggest the other way around. MPICH's autogen.sh copies the test programs from ROMIO into MPICH. Here is my argument. ROMIO can become a stand alone module to be used by other MPI implementations, such as OpenMPI. It is better to make the test programs available to other implementations.

Either could work. Since we are going to maintain the tests, and since we do not maintain other tests -- datatype, rma, f77, etc. -- in their own sub-tree, it doesn't make sense for us to treat IO tests separately. With autogen solution, the tests will be available inside ROMIO to whoever building ROMIO standalone, right?

Stand-alone ROMIO ...

I see, so you are building ROMIO as if it is a profile library. Need make sure the profile interface still work when ROMIO is not building standalone -- so you'll need a "--enable-standalone" configure option, right?

hzhou avatar Jul 05 '20 04:07 hzhou

If ROMIO is made stand alone, one should be able to run "make dist" to create a tar ball that can be built stand alone outside of MPICH. Running ROMIO's own autogen.sh should not be necessary. I still think that keeping I/O test programs in ROMIO subtree is a better solution (making ROMIO self-contained).

Once ROMIO becomes stand alone, one can simply run configure command like any other autoconf software. Option such as "--enable-standalone" is not necessary. In #4008 I mentioned two example configure commands.

wkliao avatar Jul 05 '20 05:07 wkliao

If ROMIO is made stand alone, one should be able to run "make dist" to create a tar ball that can be built stand alone outside of MPICH.

You need run autogen.sh before runningmake dist, right?

hzhou avatar Jul 05 '20 12:07 hzhou

My goal is to make ROMIO stand-alone (self-contain). Ideally, all files it needs should be stored under its source tree.

autogen.sh is one of the cord that needs to be cut. Luckily, it just copies MPICH's confdb and MPL subtrees.

Regarding to confdb, ROMIO only uses two macros: PAC_CONFIG_SUBDIR_ARGS and PAC_GET_FORTNAMES. For MPL, ROMIO can be built with or without it. Note ROMIO includes configure option --with-mpl-prefix[=DIR]

wkliao avatar Jul 05 '20 13:07 wkliao

My goal is to make ROMIO stand-alone (self-contain).

I think I am fine if you insist to maintain the io tests inside ROMIO tree. It is more important to have two sets of tests synchronized or merged. It is a simple matter to have autogen.sh to copy the tests one way or the other.

Would you have the bandwidth to look into the tests and merge them? In addition to tests in test/mpi/io, there are also tests under f77/io, cxx/io, errors/io, and I am thinking of adding threads/io. So to synchronize, can romio/test/ also mirror the sub-test structure?

Our CI only tests under test/mpi. Without synchronization, whatever work under romio/test/ is not being tested. Since we are more detached from ROMIO code, without sufficient, automatic tests, it will be very slow for us to review and merge any of the ROMIO PRs.

hzhou avatar Jul 05 '20 14:07 hzhou

Once ROMIO becomes stand alone, one can simply run configure command like any other autoconf software. Option such as "--enable-standalone" is not necessary. In #4008 I mentioned two example configure commands.

On platforms doesn't support weak symbols, such as mac, we build mpi-libraries twice. Once for libpmpi.so, with all mpi functions names as PMPI_...; and again for libmpi.so, with all mpi functions named as MPI_... but without any internal functions. mpicc will link both libmpi.so and libpmpi.so so both MPI_xxx and PMPI_xxx will be linked to the same set of internal functions and will work. When user installs a profiling library, It will replace libmpi.so and internally call PMPI_xxx.

Now as you described, the standalone version is different from both build, it will need build mpi functions as MPI_xxx and also include internal functions. So this is a third mode of build distinct from the previous two. The result can't be directly linked with either libmpi.so nor libpmpi.so (on platforms without weak symbol support). The former will collide on MPI_xxx symbols, the latter will collide on internal function names.

On platforms with proper weak symbol support, we can just build it once with names PMPI_xxx and set MPI_xxx as weak aliases. Then a standalone ROMIO will need to build it with names 'MPI_xxx` without the weak aliases. Again, this is a different build from the former.

Am I missing something?

hzhou avatar Jul 05 '20 14:07 hzhou

For MPL, ROMIO can be built with or without it. Note ROMIO includes configure option --with-mpl-prefix[=DIR]

It doesn't make sense to add some dependency without gaining something. What do we gain or lose when we build ROMIO with or without mpl?

hzhou avatar Jul 05 '20 14:07 hzhou

For weak symbol issue, I assume you are referring to #4689. Maybe your post should be there. Note my PR is not skip the build of entire profiling APIs. It is about APIs with prefix MPIOI_File_.

Regarding to MPL, please note the option --with-mpl-prefix[=DIR] is already in ROMIO. I did not add it myself. My understanding is MPL provides multithreading support. You know MPL better than I.

wkliao avatar Jul 05 '20 15:07 wkliao

For weak symbol issue, I assume you are referring to #4689. Maybe your post should be there. Note my PR is not skip the build of entire profiling APIs. It is about APIs with prefix MPIOI_File_.

It is about understanding the entire premises of your goal. Without that understanding, most of these PRs may be viewed as gratuitous changes, which will most likely get neglected.

Regarding to MPL, please note the option --with-mpl-prefix[=DIR] is already in ROMIO. I did not add it myself. My understanding is MPL provides multithreading support. You know MPL better than I.

Just did a survey, and it appears ROMIO uses quite a few MPL macros, It does not use MPL threading utilities though. The way I understand is ROMIO currently doesn't have multi-threading safety, which needs work. The OpenMPI's approach of wrapping every MPI-IO function with a lock is one way to do it, but obviously less than ideal.

The default configure is to use the embedded mpl. That is different from without mpl support. Does ROMIO compile if you remove the embedded mpl subfolder?

hzhou avatar Jul 05 '20 18:07 hzhou

I don't know why you keep emphasizing the weak symbol issue. Can you point out which of my PRs is causing this problem?

If the current ROMIO supports both weak and strong symbols, then the stand-alone ROMIO should and will keep it that way. (Is this what you try to tell me?)

Current ROMIO fails to build if without MPL. Without MPL, ROMIO loses some features, but can still function well. #4008 disables MPL but I plan to change that to make ROMIO to be able to build with MPL.

My understanding of current ROMIO is it also wraps every MPI-IO call with a lock/unlock of mutex. I remember we discussed this before.

wkliao avatar Jul 05 '20 18:07 wkliao

I don't know why you keep emphasizing the weak symbol issue. Can you point out which of my PRs is causing this problem?

Because the way you described to me --

Stand-alone ROMIO means it can be built separately from MPICH using CC=mpicc for example, where mpicc can be MPICH, OpenMPI, or Cray MPI that was built with its own MPI-IO implementation. Linking the stand-alone ROMIO will overwrite the MPI-IO subroutines of these MPI libraries. There is no ROMIO header file. All MPI user programs still include just mpi.h and all they need to do is to add -lromio at the end of link command line.

-- is based on the MPI profiling interface. ROMIO build with MPICH should support the profiling interface, but your standalone config should not, or it will cause conflict. Can you link the same library twice?

Can you point out which of my PRs is causing this problem?

I have to understand the premises before reviewing the PR. Otherwise I wouldn't know what tests to check.

Current ROMIO fails to build if without MPL. Without MPL, ROMIO loses some features, but can still function well. #4008 disables MPL but I plan to change that to make ROMIO to be able to build with MPL.

Could you point me to the commits that disables MPL?

My understanding of current ROMIO is it also wraps every MPI-IO call with a lock/unlock of mutex. I remember we discussed this before.

Not anymore. Current ROMIO in MPICH is naked. We could simply Add ROMIO_CS_ENTER/EXIT in every MPI_File_xxx functions. I haven't done that because ideally I would like to understand exactly which global states in ROMIO needs protection so we can do it more precisely. And before that, we'll need some multi-thread io tests.

hzhou avatar Jul 05 '20 20:07 hzhou

I guess ROMIO must require a compiler that supports weak symbols and mpicc is built with weak symbol enabled when built stand-alone.

On my mac, I was able to link an MPI-IO program without link errors.

% mpicc -g -O2 -o aggregation1 aggregation1.o libromio.a 

% mpicc --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

% mpichversion
MPICH Version:    	3.3
MPICH Release date:	unreleased development copy
MPICH Device:    	ch3:nemesis
MPICH configure: 	--silent --prefix=/Users/wkliao/MPICH --with-device=ch3:nemesis --disable-fortran CC=gcc
MPICH CC: 	gcc    -O2
MPICH CXX: 	g++   -O2
MPICH F77: 	  
MPICH FC: 	  
MPICH Custom Information: 	

Commit 81b626ad937a72283abe0def777d9512fbc84f9e disable MPL. Please check those MPL_ macros in romio/adio/include/adioi.h Note I plan to revise to add an option to ROMIO to enable/disable MPL.

Regarding to multithreading in ROMIO, I am seeing ROMIO_THREAD_CS_ENTER and ROMIO_THREAD_CS_EXIT wrapping around MPI-IO APIs. If ROMIO is built embedded inside of MPICH, these functions calls MPIR_ MPL_. See romio/mpi-io/mpioimpl.h

#define ROMIO_THREAD_CS_ENTER() MPIR_Ext_cs_enter()
#define ROMIO_THREAD_CS_EXIT() MPIR_Ext_cs_exit()
#define ROMIO_THREAD_CS_YIELD() MPL_thread_yield()

wkliao avatar Jul 06 '20 03:07 wkliao

On my mac, I was able to link an MPI-IO program without link errors.

Interesting. Did you build the MPICH then the standalone ROMIO using the same ROMIO source?

I guess ROMIO must require a compiler that supports weak symbols and mpicc is built with weak symbol enabled when built stand-alone.

I am a bit confused. "must" implies it won't work otherwise, right? Then you follow by saying it works on mac which I know it doesn't do weak symbols... What is your mpicc -show?

Commit 81b626a disable MPL. Please check those MPL_ macros in romio/adio/include/adioi.h Note I plan to revise to add an option to ROMIO to enable/disable MPL.

Ahh, thanks. That makes sense. I would recommend create a separate header files, for example mpl_replacement.h, then you can simply configure to swap between mpl.h or mpl_replacement.h. It is cleaner than messing up the adioi.h IMO.

Do you also have a replacement equivalent for mpir_ext.h and glue_romio.c?

Regarding to multithreading in ROMIO, I am seeing ROMIO_THREAD_CS_ENTER and ROMIO_THREAD_CS_EXIT wrapping around MPI-IO APIs. If ROMIO is built embedded inside of MPICH, these functions calls MPIR_ MPL_. See romio/mpi-io/mpioimpl.h

Ahh, yes, they are protected with romio_mutex. Thanks for reminding me :).

Now back to making ROMIO standalone -- among the many new PRs, which are essential for that? For example, this PR I would think is not essential for the standalone build, right? It is fixing a test. While we could just merge it, but I would prefer if the tests are synchronized so we could simply run the CI tests to confirm it.

hzhou avatar Jul 06 '20 04:07 hzhou

% mpicc -show
gcc -framework OpenCL -Wl,-flat_namespace -I/Users/wkliao/MPICH/include -L/Users/wkliao/MPICH/lib -lmpi -lpmpi

File config.log indicates weak symbol is supported.

configure:30575: checking whether __attribute__ ((weak)) allowed
configure:30599: result: yes
configure:30603: checking whether __attribute__ ((weak_import)) allowed
configure:30627: result: yes
configure:30630: checking whether __attribute__((weak,alias(...))) allowed
configure:30669: result: no

Those PRs are from #4008 , they were developed during the course of my development for stand-alone ROMIO. They thus far are not yet essential PRs for stand-alone ROMIO. I believe they can be merged, like normal PRs. The coming PRs from me will be essential for stand-alone ROMIO.

wkliao avatar Jul 06 '20 04:07 wkliao