BigMPI icon indicating copy to clipboard operation
BigMPI copied to clipboard

Information about backend support missing?

Open olesenm opened this issue 1 year ago • 3 comments

Although this project looks somewhat dead, I did have some questions:

The idea of chucking really large messages into a new data type looks quite appealing. However, it's not clear if the various backends (eg, infiniband) would actually support sizes greater than 2^31. Any central information about this available?

olesenm avatar Apr 22 '24 13:04 olesenm

The project is maintained (bugs will be fixed) but not developed (features will not be added). Once the feature set I proposed made it into MPI 4.0, I moved on to other things.

Many of the back-ends for MPI, e.g. UCX, already support size_t counts, in which case they should be fine already. There may still be implementation bugs, of course. There have been discussions lately about how to test MPICH better, for example.

If you find things don't work somewhere, please report them and I'll do my best to help. I have access to a pretty good range of hardware.

jeffhammond avatar Apr 22 '24 13:04 jeffhammond

Wow - fast feedback, and great offer too! At the moment, I was considering that I might just be able to incorporate some of your bigmpi ideas without the whole thing, since we already have an encapsulated interface to MPI. It looks like would "only" need to mimic your conditional creation of the new MPI datatype and then test, test, test.

In your code you have the BigMPI_Factorize_count : just playing about? or useful/critical? I don't really have a feeling for how expensive the type creation and freeing is.

olesenm avatar Apr 23 '24 13:04 olesenm

yes, allowing others to borrow what they need from BigMPI and integrate into their own projects to avoid another dependency was an explicit design goal.

BigMPI_Factorize_count is not strictly necessary and it seems BIGMPI_AVOID_TYPE_CREATE_STRUCT isn't defined by default so it shouldn't be compiled.

BigMPI_Factorize_count exists so that one can use a vector rather than a struct datatype. in any reasonable implementation of MPI, both will be recognized as a big contiguous slab and optimized to the same thing. i haven't verified this but implementers tell me it's true.

jeffhammond avatar Apr 23 '24 14:04 jeffhammond