pash icon indicating copy to clipboard operation
pash copied to clipboard

The sigpipe signals we issue cause early termination in distributed exec mode.

Open tammam1998 opened this issue 3 years ago • 0 comments

When using distributed execution mode the subgraph scripts terminate early causing incorrect results. I was able to track the issue to the termination code used in eager.sh and dgsh_tee.sh. It seems like the sigpipe is being sent before the sockets finish writing. I commented out the code causing this in #537 .

It might also be useful to recheck the test cases that required this solution as there have been multiple fixes in autosplit and r_split recently that resolved issues related to files hanging and sigpipe signals.

tammam1998 avatar Apr 22 '22 00:04 tammam1998