rmm icon indicating copy to clipboard operation
rmm copied to clipboard

Remove cuda event deadlocking issues in device mr tests

Open robertmaynard opened this issue 3 years ago • 1 comments

We fixed both deadlocking issues due to a assumption that std::mutex would have fair scheduling, and work around deadlocks found in cuda event created in very short lived threads ( < 10ms ).

robertmaynard avatar Sep 22 '22 21:09 robertmaynard

@ajschmidt8 please test on ARM before we merge.

harrism avatar Sep 23 '22 23:09 harrism

I never tested the problematic code outside of CI, so I have no way of verifying whether this fix works as intended. I'll defer to the devs for the approvals here. If this fix looks good to everyone else, let's get it merged and Ops will add these changes to our GitHub Actions POC PR to see if we still experience any issues.

ajschmidt8 avatar Sep 27 '22 15:09 ajschmidt8

@gpucibot merge

ajschmidt8 avatar Sep 27 '22 15:09 ajschmidt8