mpich icon indicating copy to clipboard operation
mpich copied to clipboard

coll: error-checking code missing macro guard

Open lykke-li opened this issue 2 years ago • 4 comments

Some bcast implementations (e.g., intra_pipelined_tree and intra_tree) missed the #ifdef HAVE_ERROR_CHECKING macro, which seems inconsistent when configuring with ./configure --disable-error-checking.

Issue details:

In the intra_binomial implementation, error checking is enclosed within the #ifdef HAVE_ERROR_CHECKING macro. Here's an example snippet:

#ifdef HAVE_ERROR_CHECKING
    /* check that we received as much as we expected */
    MPIR_Get_count_impl(status_p, MPI_BYTE, &recvd_size);
    MPIR_ERR_COLL_CHECK_SIZE(recvd_size, nbytes, errflag, mpi_errno_ret);
#endif

However, in the intra_tree and intra_pipelined_tree implementations, similar error checking code exists but without the #ifdef HAVE_ERROR_CHECKING macro. This seems inconsistent when configuring with ./configure --disable-error-checking.

lykke-li avatar Dec 12 '23 22:12 lykke-li

@lykke-li Thanks you for reporting the issue. I recall you mentioned that the current intra_tree algorithm will run into failures when configured with --disable-error-checking, right?

hzhou avatar Dec 13 '23 03:12 hzhou

@lykke-li Thanks you for reporting the issue. I recall you mentioned that the current intra_tree algorithm will run into failures when configured with --disable-error-checking, right?

@hzhou The failure occurs specifically when the received size differs from the expected receive size. This issue arises only when modifications are made to the internal algorithm implementation. If the internal implementation remains untouched, there should be no issues (just error-checking for intra_tree cannot be disabled). For example, if a user just use general MPI_bcast API in their application, then it is fine, no failure.

lykke-li avatar Dec 13 '23 04:12 lykke-li

Oh, I see. Your modification relies on receiving different amounts of data from the original size, thus you need to disable the error checking, right?

hzhou avatar Dec 13 '23 04:12 hzhou

Oh, I see. Your modification relies on receiving different amounts of data from the original size, thus you need to disable the error checking, right?

@hzhou Yes.

lykke-li avatar Dec 13 '23 04:12 lykke-li