ompi
ompi copied to clipboard
ompi/datatype: use size_t for count arguments
The use of int for count arguments is becoming restrictive especially to adopt large count. This change extends the argument type to size_t.
This change in API will break the MPI datatype management in many ways (basically everything that relies on the datatype storage description such as one-sided, IO, and all the datatype representation manipulation functions, the combiner_*). The root cause is that the internal storage format for the datatype, and all the functions to expose it to other libraries are based on int32_t.
I have the feeling that the correct solution is to split the datatype API in two, one used by MPI where we follow MPI expectations, and the internal one where we can use counts but we will not provide any data representation support (the support will remain at the MPI level). Based on this, we will only use the internal API inside OMPI and the external API will be reserved for the MPI layer.
I also had the concern of breaking API compatibility but none of our CI(internal and gh action) actually caught this so that's interesting.
I was reading the code and saw that the count argument is sometimes typed size_t like here - so I have a hunch that it should work somehow and it is possible to change the type safely.
I also agree with the proposed solution to introduce another internal API - I haven't yet figured out how.
@wenduwan It looks like you are still working on this. Do you want to move this PR to Draft?
@bosilca Thanks for your comments. I'm looking into this PR again.
After a fresh look at the change, it appears to have a much larger blast radius than I expected.
The original revision might started with the wrong place - it was changing ompi datatype APIs but that requires adaptation to the internal opal datatypes. With that in mind, I think it is better to start with opal datatypes instead. Updated PR accordingly.
Still looking at other opal functions to find out what else needs to change.