ompi
ompi copied to clipboard
Utilize PMIx type checking for PML check
It appears that sometimes we get a bogus value back for the PML selected by a remote proc. Unfortunately, the current code for exchanging that info removes all type information from the exchange.
Replace the use of those macros with native PMIx calls so we can check that the data type being returned is the one we expected. True, it should not have changed - but this will help detect and debug any errors.
So...would it be better for the user to blindly segfault when there is an error, instead of detecting that something is wrong, reporting the error, and cleanly exiting? Remember, in the case in question, you were just being lucky - you were treating a blind object as a char* and running strcmp on it, when in fact it was NOT a string. Is that the desired programming style in OMPI?
It isn't just a bug in PMIx that could cause the situation - could happen between two OMPI procs in different jobs, each running a different version. Where do you guarantee someone will always post the same type of data for a given key?
This PR is just adding some protection. Someone else when they have time can go through the entire ompi code base and add similar protection. If this passes CI i''m merging.
- Then we should provide a similar patch for all instances of string storage in the entire OMPI code base, not fix just one that happen to have highlighted a bug in another library.
- Assuming PMIX had returned the correct value corresponding to the requested key or NULL, this patch would be of no need.