MPI.jl MPI.Reducev

It would be nice if we could have an MPI.Reducev for reducing variable-sized Julia objects. I saw a Python implementation, maybe a GitHub gist [1], with an implementation that I roughly ported to Julia, probably with some mistakes based on my limited understanding of the MPI.jl API:

function Reducev(value, op, comm::MPI.Comm)
    # Reduces values on all processes to a single value on rank 0.
    # 
    # This function does the same thing as the function MPI_Reduce using
    # only MPI_Send and MPI_Recv. As shown, it operates with additions on
    # integers, so you could trivially use MPI_Reduce, but for operations
    # on variable size structs for which you cannot define an MPI_Datatype,
    # you can still use this method, by modifying it to use your op
    # and your datastructure.

    # TODO: Actually determine buffer
    tag = 0
    size = get_nworkers(comm)
    rank = get_worker_idx(comm)-1
    lastpower = 1 << log2(size)

    # each of the ranks greater than the last power of 2 less than size
    # need to downshift their data, since the binary tree reduction below
    # only works when N is a power of two.
    for i in lastpower:(size-1)
        if rank == i
            MPI.send(value, i-lastpower, tag, comm)
        end
    for i in 0:(size-lastpower-1)
        if rank == i
            MPI.Recv!(recvbuffer, i+lastpower, tag, comm)
            value = op(value, recvbuffer)
        end
    end

    for d in 0:(fastlog2(lastpower)-1)
        k = 0
        while k < lastpower
            k += 1 << (d + 1)
        end
        receiver = k
        sender = k + (1 << d)
        if rank == receiver
            MPI.Recv!(recvbuffer, 1, sender, tag)
            value = op(value, recvbuffer)
        elseif rank == sender
            MPI.Send(value, 1, receiver, tag)
        end
    end
    value
end

function fastlog2(v::UInt32)
    multiply_de_bruijn_bit_position::Vector{Int32} = [
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    ]

    v |= v >> 1 # first round down to one less than a power of 2 
    v |= v >> 2
    v |= v >> 4
    v |= v >> 8
    v |= v >> 16

    # TODO: Fix this
    multiply_de_bruijn_bit_position[(UInt32(v * 0x07C4ACDDU) >> 27) + 1]
end

[1] https://gist.github.com/rmcgibbo/7178576

Dec 19 '21 07:12 calebwin

Do you want something like reduce from mpi4py: https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI.Comm.html#mpi4py.MPI.Comm.reduce

Dec 19 '21 20:12 simonbyrne

Huh, I'm not sure - do you know if that allows variable-sized data? The documentation doesn't seem to say..

Dec 20 '21 02:12 calebwin

Yeah, their docs are a bit unclear, but the implementation is based on serializing send and recv: https://github.com/mpi4py/mpi4py/blob/18f16da45eaaacfba4cc0293964e1a6548d4c6a6/demo/reductions/reductions.py#L12-L41

Dec 20 '21 19:12 simonbyrne

oh, actually that is a demo, not what is actually used. It appears to be this reduce which calls PyMPI_reduce, which in turn calls either

PyMPI_reduce_naive, which calls PyMPI_gather and uses a Gather of lengths, followed by a Gatherv of the data, or
PyMPI_reduce_intra, which calls PyMPI_reduce_p2p, which uses PyMPI_send_p2p, which is based on Send

Dec 20 '21 19:12 simonbyrne

Hmm.. it looks like PyMPI_reduce_p2p is what we would want to implement.

Dec 21 '21 21:12 calebwin

MPI.jl MPI.jl copied to clipboard

MPI.Reducev

MPI.jl
MPI.jl copied to clipboard