parsec icon indicating copy to clipboard operation
parsec copied to clipboard

Wrapper around parsec_ce to prevent GPU manger from hitting MPI

Open devreal opened this issue 1 year ago • 3 comments

Description

PTG and DTD use remote deps, which provide a command queue to which threads can offload remote operations. TTG doesn't use that facility and won't for the foreseeable future (the return on investment is too small to seriously tackle this). Instead, we use the parsec_ce there, which provides raw access to the function pointer into the CE. That is a bad design in the first place. We need an abstraction around parsec ce that can decide whether the calling thread should call into MPI directly or not and a mechanism to thread-shift operations to the the communication thread.

The problem with the current approach is that any thread can hit MPI when handling successors in TTG. Device management threads should never hit MPI, ever. So at some point a decision has to be made that the thread calling parsec_ce should thread-shift its operation to the communication thread.

That will also involve buffer management (i.e., getting a buffer from the CE that is then managed by the CE going forward). .

devreal avatar Dec 06 '24 16:12 devreal

I'm very confused about this issue. First because the parsec_ce was never meant for applications, but was created to isolate the communication library (be it MPI, LCI or UCX) itself from the communication needs of PaRSEC. Second, PaRSEC has a proper mechanism to thread-shift work onto the communication thread, but TTG made the decision not to use it. So, what secondary thread-shifting mechanism you think it is needed ?

bosilca avatar Dec 06 '24 17:12 bosilca

You are talking about the remote deps dequeue? As I said, there is no way for TTG to make use of that in its current form because we have to transfer more than a single blob of data. Communication in TTG is more complex than what remote deps can provide us with. Can we go and extend PaRSEC's remote deps to accommodate TTG? Maybe. But the ROI will be small and there are so many other things that are more important. That "proper way" is proper for PTG and DTD but not usable for TTG. I remember that the parsec_ce was advertised to users as a tool for active messages, so it's not just internal.

The remote deps can continue to use the callback function pointers in the parsec_ce structure. I just need a layer around it so I don't have to.

devreal avatar Dec 06 '24 18:12 devreal

I wrote down the reason for why "the proper way" doesn't work for TTG: https://github.com/ICLDisco/parsec/issues/714

devreal avatar Dec 06 '24 23:12 devreal